bids-standard / pybids

Python tools for querying and manipulating BIDS datasets.
https://bids-standard.github.io/pybids/
MIT License
219 stars 121 forks source link

`BIDSLayoutIndexer`'s `**filters` argument not working as expected #1062

Open alperkent opened 3 months ago

alperkent commented 3 months ago

Hello again! Following the fix in PR #1049 for the error mentioned in #1050, I've noticed that the BIDSLayoutIndexer is still not functioning as expected. As an example, despite setting filters for datatype and suffix, the indexer returns all entities when running the following code:

import bids

filters = {'datatype': 'eeg', 'suffix': 'eeg'}
indexer = bids.BIDSLayoutIndexer(**filters)
layout = bids.BIDSLayout('../data/bids-examples/eeg_ds000117', indexer=indexer)
layout.get_datatypes(), layout.get_suffixes()

I expected the get_datatypes() and get_suffixes() methods to return only the entities that match the filters set in the BIDSLayoutIndexer. But, the methods return all entities:

(['anat', 'eeg'], ['description', 'participants', 'T1w', 'coordsystem', 'electrodes', 'eeg', 'events', 'channels'])

While workarounds are possible using layout.get() and BIDSLayoutIndexer(ignore=...), I wanted to raise this issue to bring it to the devs' attention.

effigies commented 3 months ago

Yes, it looks like _index_file() does not respect self.filters like _index_metadata() does:

https://github.com/bids-standard/pybids/blob/20007e588d5375db9587d3dfe451a3fbe5cae1e3/bids/layout/index.py#L243-L280

If you're interested in addressing this, it's probably worth noting that _index_metadata() should be working with a copy of self.filters, not mutating it directly.

alperkent commented 3 months ago

Interestingly, after adding the if clause below to _index_file(), layout.get_entity() functions seem to be respecting the filtering but layout.get() still does not:

        # Extract entity values
        match_vals = {}
        for e in entities.values():
            m = e.match_file(bf)
            if m is None and e.mandatory:
                break
>           # Break if entities don't match the filter
>           if self.filters and e.name and e.name in self.filters and m and m not in self.filters[e.name]:
>               break
            if m is not None:
                match_vals[e.name] = (e, m)
effigies commented 3 months ago

I can't promise I'll have time to dig into this soon. Please feel free to keep working on this if you like.

It would be good to get out a new version of pybids, with Python 3.12 support. Do you consider this a blocker, and if so, how much time would you like to try to get something in before we release?

alperkent commented 3 months ago

I don't think it is a blocker (I already implemented a workaround in my own code by using the ignore argument instead) so please feel free to move forward with the release. Would you like me to reopen the half-baked fix (#1063) so that at least the get_<entity>s functions respect the indexer filters in the new release?

effigies commented 3 months ago

If it improves things, sure. If you want to dig deeper for a more complete fix first, that's fine, too.

I am not using this feature, so I'm unlikely to dig into it myself, and I'm inclined to follow your lead as someone who uses the feature.

alperkent commented 3 months ago

OK, I reopened it. It restores some functionality, so I think it improves things, albeit slightly (and hopefully does not break anything). I should have time to dig into it a bit more in the upcoming weeks, so feel free to move forward with the release.