datalad / datalad-catalog

Create a user-friendly data catalog from structured metadata
https://datalad-catalog.netlify.app
MIT License
15 stars 11 forks source link

BIDS translator fails if no keywords are present #461

Closed fraimondo closed 5 months ago

fraimondo commented 6 months ago

I am doing some tests and trying to figure out how the catalog+metalad+bids interaction works.

I have a minimal BIDS test dataset which is valid (by bids-validator). I managed to extract metadata using meta-conduct, but the translation fails.

[DEBUG  ] could not perform all requested actions: IncompleteResultsError(Command did not complete successfully. 1 failed:
[{'action': 'catalog_translate',
  'exception': TypeError("object of type 'NoneType' has no len()"),
  'exception_traceback': '[translate.py:__call__:226,translate.py:run_translator:329,bids_dataset_translator.py:translate:97,bids_dataset_translator.py:translate:198,bids_dataset_translator.py:get_keywords:151]',
  'path': PosixPath('../catalog'),
  'status': 'error'}])

The issue seems to be with the keywords:

https://github.com/datalad/datalad-catalog/blob/76bb8d99b529c8c644ddff545cdf5a6ce1f63986/datalad_catalog/translators/bids_dataset_translator.py#L148-L151

Indeed, my extracted metadata has no task (is just anatomical images) and no variables:

        "entities": {
            "subject": [
                "001",
                "002",
                "003"
            ],
            "suffix": [
                "README",
                "description",
                "participants",
                "T1w"
            ],
            "datatype": [
                "anat"
            ],
            "extension": [
                ".md",
                ".json",
                ".tsv",
                ".nii.gz"
            ]
        },
        "variables": {},

Since this is still a valid BIDS dataset (though quite useless), my take is that it should be still be accepted by the catalog translator.

Workaround/fix:

    def get_keywords(self):
        program = ". as $parent | .entities.task + .variables.dataset"
        result = jq.first(program, self.extracted_metadata)
        return result if result is not None and len(result) > 0 else None
jsheunis commented 6 months ago

Thanks for the issue and workaround @fraimondo. This interaction and translation between metalad and catalog hasn't received much attention after the initial development, so I think you might run into similar issues along the way (although I hope you don't, of course).