Closed clarkenj closed 1 year ago
Since it's coming from sqlalchemy
, it might be something to do with PyBIDS...
Do you have the trace back to which line in giga_auto_qc
is giving you the error?
The first thing I will try is use the --reindex_bids
flag
after some googling, I believe this is a pybids related thing and the solution is (potentially) this one: https://github.com/nipreps/fmriprep/issues/2313#issuecomment-1013680975
AFAIK it's better to
UKBB didn't have this issue as each subject was treated as an isolated dataset.
If the proposed solution works, do you mind making a PR in the README.md
to keep this as note to users?
Ah! Interesting. Here is the relevant part of the traceback ( I think):
Traceback (most recent call last):
File "/home/nclarke/.local/bin/giga_auto_qc", line 8, in <module>
sys.exit(main())
File "/lustre03/project/6003287/nclarke/giga_preprocess2/giga_auto_qc/giga_auto_qc/run.py", line 79, in main
workflow(args)
File "/lustre03/project/6003287/nclarke/giga_preprocess2/giga_auto_qc/giga_auto_qc/workflow.py", line 47, in workflow
fmriprep_bids_layout = BIDSLayout(
File "/home/nclarke/.local/lib/python3.10/site-packages/bids/layout/layout.py", line 154, in __init__
indexer(self)
Yes I can try that!
@htwangtw - Progress, but now a new error:
^M0it [00:00, ?it/s]^M0it [00:00, ?it/s]
Traceback (most recent call last):
File "/home/nclarke/.local/bin/giga_auto_qc", line 8, in <module>
sys.exit(main())
File "/lustre03/project/6003287/nclarke/giga_preprocess2/giga_auto_qc/giga_auto_qc/run.py", line 79, in main
workflow(args)
File "/lustre03/project/6003287/nclarke/giga_preprocess2/giga_auto_qc/giga_auto_qc/workflow.py", line 69, in workflow
anatomical_metrics = assessments.calculate_anat_metrics(
File "/lustre03/project/6003287/nclarke/giga_preprocess2/giga_auto_qc/giga_auto_qc/assessments.py", line 345, in calculate_anat_metrics
metrics["anatomical_dice"]
File "/home/nclarke/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 3760, in __getitem__
indexer = self.columns.get_loc(key)
File "/home/nclarke/.local/lib/python3.10/site-packages/pandas/core/indexes/range.py", line 349, in get_loc
raise KeyError(key)
KeyError: 'anatomical_dice'
This is the output:
/home/nclarke/scratch/srpbs_fmriprep-20.2.7lts_1691842839/data
Namespace(bids_dir=PosixPath('/home/nclarke/scratch/srpbs_fmriprep-20.2.7lts_1691842839/data/fmriprep-20.2.7lts'), output_dir=PosixPath('/lustre04/scratch/nclarke/srpbs_qc'), analysis_level='participant', participant_label=None, session=None, task=None, quality_control_parameters=None, reindex_bids=False, verbose=1)
Quality control parameters: {'mean_fd': 0.55, 'scrubbing_fd': 0.2, 'proportion_kept': 0.5, 'anatomical_dice': 0.97, 'functional_dice': 0.89}
Retrieved anatomical reference mask
Use standard template as functional scan reference.
Calculate the anatomical dice score.
I tried adding some print statements to assessments.py
to probe the problem but they didn't print which I'm confused about. Any ideas? Thank you!
Yay to the progress! The error is from the pandas dataframe so it might worth looking into that I will try to probe at it too
I ran the test on two subjects using this command: pytest -m "not smoke" --doctest-modules -v --pyargs giga_auto_qc
. I'm not sure if this is completely correct as this is quite beyond my experience. But this is the output I got, which suggests that it should complete...?
[mii] loading StdEnv/2020 pytest/7.4.0 ...
========================================= test session starts =========================================
platform linux -- Python 3.10.2, pytest-7.4.0, pluggy-1.2.0 -- /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx512/Core/python/3.10.2/bin/python
cachedir: .pytest_cache
rootdir: /lustre03/project/6003287/nclarke/giga_preprocess2/giga_auto_qc
configfile: pyproject.toml
collected 10 items / 3 deselected / 7 selected
giga_auto_qc/tests/test_assessments.py::test_quality_accessments PASSED [ 14%]
giga_auto_qc/tests/test_assessments.py::test_dice_coefficient PASSED [ 28%]
giga_auto_qc/tests/test_assessments.py::test_check_mask_affine PASSED [ 42%]
giga_auto_qc/tests/test_assessments.py::test_get_consistent_masks PASSED [ 57%]
giga_auto_qc/tests/test_cli.py::test_help PASSED [ 71%]
giga_auto_qc/tests/test_utils.py::test_get_subject_lists PASSED [ 85%]
giga_auto_qc/tests/test_utils.py::test_parse_scan_information PASSED [100%]
============================= 7 passed, 3 deselected in 354.09s (0:05:54) =============================
Great, let me clarify what I was trying to say: you can modify the code in the test file, run it in an interactive session, so we can find out why it is acting weird on your dataset. Your dataset might have found some edge cases that's not covered currently...
from bids import BIDSLayout
import templateflow
from giga_auto_qc import assessments
bids_dir = "path/to/your/data"
subject_list = ["1"] # any subject number, remove `sub-`
fmriprep_bids_layout = BIDSLayout(
root=bids_dir,
database_path=bids_dir,
validate=False,
derivatives=True,
reset_database=False, # let's assume you already have a valid bids index
)
template_mask = templateflow.api.get(
["MNI152NLin2009cAsym"], desc="brain", suffix="mask", resolution="01"
)
df = assessments.calculate_anat_metrics(
subject_list,
fmriprep_bids_layout,
{"anat": template_mask},
{"anatomical_dice": 0.97},
)
I think that's what I did, except I didn't change reset_database to False... Is that what I did (see below)? Lol. Shall I re-run with reset_database=False
?
def test_calculate_anat_metrics():
bids_dir = resource_filename(
"giga_auto_qc",
"/home/nclarke/scratch/srpbs_fmriprep-20.2.7lts_1691842839/data/fmriprep-20.2.7lts",
)
fmriprep_bids_layout = BIDSLayout(
root=bids_dir,
database_path=bids_dir,
validate=False,
derivatives=True,
reset_database=True,
)
template_mask = templateflow.api.get(
["MNI152NLin2009cAsym"], desc="brain", suffix="mask", resolution="01"
)
df = assessments.calculate_anat_metrics(
["0246", "0603"],
fmriprep_bids_layout,
{"anat": template_mask},
{"anatomical_dice": 0.97},
)
print(df)
Hmmm good to know indexing is not an issue. If indexing fails, it will just not run correctly. My guess is it might be something wrong with one subject? I genuinely have no idea how that can go wrong.
I can think of an very wasteful hack here:
Loop through all the subjects, run the BIDS-app at participant level, pass the subject id to --participant-label
This way you will get a one-line tsv
for each subjects and figure out who is the imposter...
I'll give it a go, thank you!!
Plot twist, I get the same error with ds00030 and the group
flag... will see if the hack works with srpbs and then maybe try that too.
We have a version of ds000030 preprocessed. Let me just try it
Thanks! The most recent one is the one I preprocessed
In the error log for srpbs I get the following (some lines removed):
This is the output of .out file:
The path is correct.
The background at the link says "OperationalError. Exception raised for errors that are related to the database’s operation and not necessarily under the control of the programmer, e.g. an unexpected disconnect occurs, the data source name is not found, a transaction could not be processed, a memory allocation error occurred during processing, etc.
This error is a DBAPI Error and originates from the database driver (DBAPI), not SQLAlchemy itself.
The OperationalError is the most common (but not the only) error class used by drivers in the context of the database connection being dropped, or not being able to connect to the database. For tips on how to deal with this, see the section Dealing with Disconnects."
I don't think it is a space issue since I have nearly 20,000GB available. I am thinking to increase --mem (currently 8G), what do you think @htwangtw, have you come across this before?