PennLINC / qsiprep

Preprocessing of diffusion MRI
http://qsiprep.readthedocs.io
BSD 3-Clause "New" or "Revised" License
140 stars 57 forks source link

pybids crashing #513

Open hfxcarl opened 1 year ago

hfxcarl commented 1 year ago

Pybids crashes immediately when called in qsiprep-0.16.x. Even from a shell into qsiprep-0.16.1 using Singularity on a Linux workstation and calling pybids causes an immediate crash.

$ singularity shell /opt/SingularityImgs/qsiprep-0.16.1.sif

Singularity> pybids
Traceback (most recent call last):
  File "/usr/local/miniconda/bin/pybids", line 5, in <module>
    from bids.cli import cli
  File "/usr/local/miniconda/lib/python3.8/site-packages/bids/cli.py", line 52, in <module>
    def layout(
  File "/usr/local/miniconda/lib/python3.8/site-packages/click/decorators.py", line 308, in decorator
    _param_memo(f, OptionClass(param_decls, **option_attrs))
  File "/usr/local/miniconda/lib/python3.8/site-packages/click/core.py", line 2584, in __init__
    raise TypeError("'multiple' is not valid with 'is_flag', use 'count'.")
TypeError: 'multiple' is not valid with 'is_flag', use 'count'.
Singularity> 

I see the pybids version included is 0.13.2 (2021-08-20), whereas current pybids is at 0.15.5 (2022-11-08), perhaps it's time to update pybds:

Singularity> python
Python 3.8.10 (default, Jun  4 2021, 15:09:15) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import bids
>>> print(bids._version.get_versions())
{'date': '2021-08-20T12:44:44-0400', 'dirty': False, 'error': None, 'full-revisionid': '8ee63bda6244d10aabc97841d2055960af89b373', 'version': '0.13.2'}
>>> 

Where I first noticed the error is from trying to run qsiprep with "--bids-database-dir" pointing to a pre-indexed bids-raw layout_index.sqlite (generated using pybids_v0.15.1 from mriqc-22.0.1.sif) for input into qsiprep-0.16.1. I prefer the control of running individual subjects with fixed resources on my Linux 64-core workstation, and staggering individual runs in parallel to maximize resources while also running other container pipelines (fmriprep, mriqc, etc). The reason I want to run with the "--bids-database-dir" option is that otherwise running one subject from a bids-raw folder with >100 subjects triggers pybids each time to index the entire bids-raw folder even though it is only running one specific subject. A bids-raw input folder with ~200 datasets takes 30-40 mins, and therefore adds 30-40 mins unnecessarily re-indexing the entire bids-raw folder! Obviously this is a second issue, related to Issue #217. Perhaps it would be better to have pybids only index the specific subjects/data that is called with "--participant-label"?

Please let me know how I can help solve either issue. Thanks, Carl

tsalo commented 8 months ago

I think the bug can be addressed by settings the requirement for pybids from "pybids < 0.16.1" to "pybids < 0.16.1,> 0.13.2" (basically skipping the problematic version).

Perhaps it would be better to have pybids only index the specific subjects/data that is called with "--participant-label"?

Is this something leveraged in other BIDS Apps? If so, it seems like a good thing to add.

cookpa commented 8 months ago

I'm not aware of other apps that index a single participant. It adds a big delay to processing for large datasets.

In Python, you can do it by exclusion, you have to exclude all the other subjects with the ignore kwarg

ignore_pattern = re.compile(r'^(?!.*\/sub-mysubject($|\/))')
layout = BIDSLayout(bids_root, ignore=[ignore_pattern])
tsalo commented 8 months ago

That's probably doable within the BIDSLayout initialization step (or even building a regex that covers multiple participants when requested), though I think we should wait to implement it until we have the nipreps-style config object implemented.

Just going to paste in a regex Copilot wrote that should be good for explicitly ignoring any folders except for a list of requested subjects:

re.compile(r'sub-(?!' + '|'.join(excluded_folders) + r')\w+')