brainlife / ezbids

A web service for semi-automated conversion of raw imaging data to BIDS
https://brainlife.io/ezbids
MIT License
25 stars 11 forks source link

[Feature Request] Input Additional Metadata via Spreadsheet #80

Open bendhouseart opened 1 year ago

bendhouseart commented 1 year ago

It appears that having the ability to:

  1. Upload (BIDS formatted see this link for an example of PET) spreadsheet data
  2. Match uploaded spreadsheets to imaging/session data
  3. Direct matched data to BIDS output (sidecar nifti's for starters)

would be a useful feature addition to ezBIDS for PET and ASL as well as other modalities. Adding this as an issue to continue the discussion @dlevitas and @bendhouseart started in slack.

dlevitas commented 1 year ago

Additionally, the main purpose of this request is to create an efficient workflow that allows users to add information to their dataset's json sidecars that are required by BIDS but not contained in the DICOM headers, meaning dcm2niix cannot extract this information and place it in the sidecars. Uploaded spreadsheet(s) will contain columns for each sidecar field that's required by BIDS for the specific sequence, with PET and ASL primarily in mind. The goal will be to map these spreadsheets to their corresponding json sidecars and insert the necessary information in order to pass BIDS validation.

In theory, this should roughly follow the Events workflow.

Issue #66 will likely benefit from this feature.

bendhouseart commented 1 year ago

Following up here with -> https://github.com/openneuropet/PET2BIDS/pull/210

Would you prefer that I generalize the following functions from pet2bids.helper_functions so that they work equally well with ASL? All that I do is ingest spreadsheets and their data and output python dict/json compliant data from therein with them.

It might save you a bit of trouble as you could just import them into ezBIDS instead of rolling your own.

```python def flatten_series(series): """ This function retrieves either a list or a single value from a pandas series object thus converting a complex data type to a simple datatype or list of simple types. If the length of the series is one or less this returns that single value, else this object returns all values within the series that are not Null/nan in the form of a list :param series: input series of type pandas.Series object, typically extracted as a column/row from a pandas.Dataframe object :return: a simplified single value or list of values """ simplified_series_object = series.dropna().to_list() if len(simplified_series_object) > 1: pass elif len(simplified_series_object) == 1: simplified_series_object = simplified_series_object[0] else: raise f"Invalid Series: {series}" return simplified_series_object def collect_spreadsheets(folder_path: pathlib.Path): spreadsheet_files = [] all_files = [folder_path / pathlib.Path(file) for file in os.listdir(folder_path) if os.path.isfile(os.path.join(folder_path, file))] for file in all_files: if file.suffix == '.xlsx' or file.suffix == '.csv' or file.suffix == '.xls' or file.suffix == '.tsv': spreadsheet_files.append(file) return spreadsheet_files def single_spreadsheet_reader( path_to_spreadsheet: Union[str, pathlib.Path], pet2bids_metadata_json: Union[str, pathlib.Path] = pet_metadata_json, dicom_metadata={}, **kwargs) -> dict: metadata = {} if type(path_to_spreadsheet) is str: path_to_spreadsheet = pathlib.Path(path_to_spreadsheet) if path_to_spreadsheet.is_file(): pass else: raise FileNotFoundError(f"{path_to_spreadsheet} does not exist.") if pet2bids_metadata_json: if type(pet_metadata_json) is str: pet2bids_metadata_json = pathlib.Path(pet2bids_metadata_json) if pet2bids_metadata_json.is_file(): with open(pet_metadata_json, 'r') as infile: metadata_fields = json.load(infile) else: raise FileNotFoundError(f"Required metadata file not found at {pet_metadata_json}, check to see if this file exists;" f"\nelse pass path to file formatted to this {permalink_pet_metadata_json} via " f"pet2bids_metadata_json argument in simplest_spreadsheet_reader call.") else: raise FileNotFoundError(f"pet2bids_metadata_json input required for function call, you provided {pet2bids_metadata_json}") spreadsheet_dataframe = open_meta_data(path_to_spreadsheet) # collect mandatory fields for field_level in metadata_fields.keys(): for field in metadata_fields[field_level]: series = spreadsheet_dataframe.get(field, Series(dtype=numpy.float64)) if not series.empty: metadata[field] = flatten_series(series) elif series.empty and field_level == 'mandatory' and not dicom_metadata.get(field, None) and field not in kwargs: logging.warning(f"{field} not found in {path_to_spreadsheet}, {field} is required by BIDS") # lastly apply any kwargs to the metadata metadata.update(**kwargs) return metadata ```

I'm most of the way there for the listed issues in this Feature Request barring a bit of testing, so let me know if any of the above would be helpful.

dlevitas commented 1 year ago

Would you prefer that I generalize the following functions from pet2bids.helper_functions so that they work equally well with ASL? All that I do is ingest spreadsheets and their data and output python dict/json compliant data from therein with them.

Yeah, if that's not too much, it would be very helpful!