why is having multiple surfaces and morphometry files in different sessions a problem?

cindyhfls commented 2 weeks ago

In Util/bids.py

There are multiple errors where the same participant have multiple surfaces or morphometry files. I think that's normal for longitudinal sessions to have different surfaces and morphometry files. No? Can we make that a warning not an error?

Also you guys have two many irrelevant anatomical requirements that is contributing to the generation of summary figures (?) but not really used in confound removal (which was stated as the goal). Is it possible to have flag for a simple module that gets rid of all those requirements if the user just want to deal with confounds because they are error-prone...

cindyhfls commented 2 weeks ago

Also I have to add that the BIDSLayout does not recognize the suffix as part of the participant id, which would not be a problem if the original data was in BIDS format but it would be a pain if people used ingression to convert other data formats (I understand that it is rare that you would use it) but it was promoted in the paper as if the pipeline was general-purpose and accommodating for all preprocessing pipelines which it really isn't.

I ended up having to separate my folder name into participant and session part for longitudinal sessions (the original folder name is _V1_MR for visit 1 etc.

mattcieslak commented 1 week ago

Hi Cindy, a couple thoughts here

There are multiple errors where the same participant have multiple surfaces or morphometry files.

That's just how fMRIprep processes/writes out these files at the moment. We will support session-wise anatomical data in the near future. There is an existing issue for that though: #1306.

Also you guys have two many irrelevant anatomical requirements that is contributing to the generation of summary figures (?) but not really used in confound removal (which was stated as the goal).

There was some debate about this, but ultimately xcpd also is used for QC and these files are used for QC by some large ongoing projects. If you don't personally use these outputs, you can always delete them without affecting your confound-removed bold data.

Also I have to add that the BIDSLayout does not recognize the suffix as part of the participant id,

I'm not sure I follow - suffixes aren't part of subject IDs. In BIDS "suffix" means the last item before the file extension.

it was promoted in the paper as if the pipeline was general-purpose and accommodating for all preprocessing pipelines which it really isn't.

In the paper we show how xcpd works with fmriprep, HCP and ABCD-BIDS. It's a BIDS App, and only works on the HCP and ABCD-BIDS inputs by first converting them to BIDS. BIDS is what allows xcpd to correctly process almost any sort of input data (in BIDS) automatically. I'd be very interested in seeing a pipeline that works on arbitrary file names!

I ended up having to separate my folder name into participant and session part for longitudinal sessions (the original folder name is _V1_MR for visit 1 etc.

Right! That's how BIDS requires longitudinal data be named.

cindyhfls commented 1 week ago

Hi, thanks for your detailed reply.

"We will support session-wise anatomical data in the near future" - thanks, this seems like a pretty critical feature if they were using it for HBCD and BCP datasets with vastly different anatomical data.

"There was some debate about this, but ultimately xcpd also is used for QC and these files are used for QC by some large ongoing projects. If you don't personally use these outputs, you can always delete them without affecting your confound-removed bold data." - can you tell me what is the best way to delete them (as in which functions to modify so that it won't affect downstream processes because I might want to have the QC files but leave the anatomical images as blank if it is not found instead of erroring it out). Also I'm using the container (as recommended), which is why this is hard. And that is why I'm suggesting that you can at least make a flag or a warning/skip missing components instead of producing errors. Right now this is too rigid and easy to break.

"I'd be very interested in seeing a pipeline that works on arbitrary file names!" - well, I'm just saying that the ingression functions are quite rigid and preliminary and only work for a specific version of processing/file naming but exaggerated. To give context, my data was processed with the HCP pipeline but I still had to go through a lot of issues to get it working and still got errors for some subjects for the morphometry files. Of course, there is no one-size-fits-all pipeline. But some functions that can make it more user-friendly can include:

Use a submodule that contains all the utility functions/unit tests to build someone's custom ingression code with a template of what the filename in bids format should be for the pipeline to not have an error. (it seems like that repository exist and just not mentioned in the documents but I already downloaded the xcp-d github version for this purpose so I did not try that).
Use wildcard arguments as much as possible for filenames instead of the specific filenames
Use a json structure to store a dictionary to map from custom filename to bids-format: e.g. {surface_left:'*_32k_L.gii'} and some examples so an educated researcher can quickly modify the input requested without digging into the code and modify things from a container.

And the minimal, mention it in the documentation and paper. I'm not expecting a well-rounded pipeline that will magically work for everything, but my point is that you should at least acknowledge the limitation in the discussion. I don't think that's an unreasonable request.

mattcieslak commented 1 week ago

Agreed, the session-specific anatomical issues in fmriprep have been a big headache for developmental work.

It sounds like you have some example data that came out of the official HCP pipeline but isn't working well with xcpd's ingressors. We haven't seen these kind of HCP Pipeline inputs - in the paper we only tested on HCP-D and we've tested locally with HCP-YA. Would you be able to work on a PR that would accommodate them? We'd be happy to review.

For the file deletion I was suggesting letting xcpd run completely and then deleting those extra anatomical outputs to save disk space. It sounds like you're having trouble getting your anatomical data into a format that lets these workflows run successfully?

cindyhfls commented 1 week ago

yes, my concern is not mainly about disk space. It's more about that those missing file broke the pipeline and it's hard to debug.

cindyhfls commented 1 week ago

btw do you think it is possible at all the skip the ingression and fit the required file name structure as a json file? This would be much more efficient in time and space.

PennLINC / xcp_d

why is having multiple surfaces and morphometry files in different sessions a problem? #1313