concatenate() at teardown stage makes assumption on task name ending

xhaoNY commented 3 years ago

The concatenate(concatlist, output_folder) function called at teardown stage assumes the last two characters of taskname representing the run number such as 01, 02... However, outputs from fmriprep add "_desc_preproc" to the end of the original taskname and thus make all the tasks having the exact same ending of "oc". This causes problem and teardown cannot finish successfully.

ericearl commented 3 years ago

@xhaoNY This is a good catch.

@perronea I think we need a more robust regular expression put in place for the taskname and run number parsing. Can you please add this to your queue?

madisoth commented 3 years ago

There seems to be an issue with the regex as of v4.0.6; the first capture group captures everything starting at "task-" and ending immediately before the last digit before an underscore; the second capture group captures only the last digit.

In short, if the run labels are zero-padded in front ("run-01", "run-02", "run-03") the leading zero is included in the first capture group and becomes part of taskname . If there are >= 10 runs, runs 10-19 have an extra "1" attached to taskname, runs 20-29 have an extra "2" attached, etc.

During teardown this also means runs 1-9 are being concatenated to task-\<task-label>0DCANBOLDProc\<ver>.dtseries.nii, runs 10-19 to task-\<task-label>1DCANBOLDProc\<ver>.dtseries.nii, runs 20-29 to task-\<task-label>2DCANBOLDProc\<ver>.dtseries.nii, etc...

The relevant snippet of dcan_bold_proc.py is

expr = re.compile(r'.*(task-[^_]+).*([0-9]+).*')
taskname = expr.match(task).group(1)

Example of current regex behavior:

If the regex is changed so the second capture group matches exactly two digits, I think the problem would be fixed? (Admittedly it's messy to rely on the assumption run indices will be always two-digits and zero-padded, but we were assuming as much prior to this regex...) Example:

ericfeczko commented 3 years ago

the following regex might be more a general solution .*(task-[^0-9]+)([0-9]*[^_]).* it won't work if numerics are part of the task name though

The benefit here is that the solution will work for any combination of letters and numbers under BIDS compatible formats -- since the first part is just looking for numbers to terminate.

In fact, it will work for some violations of BIDS compatible formats as well:

However, here's what happens if numerics are part of the task name:

ericfeczko commented 3 years ago

I've made the changed and modified the "develop" branch accordingly. I created a merge request for review and assigned @kathy-snider and @perronea, but really if anyone can test it that should be fine.

Once the review is approved it should automatically merge the change into master, resolving it.

@xhaoNY feel free to try the develop branch and see if it resolves your issue. If it does, we can pull it into main and close off the ticket.

kathy-snider commented 3 years ago

Yo! I "approved" the changes before I read these emails Testing is part of making the change, so if this has not been testing, I would suggest doing that as soon as possible.

From: ericfeczko @.***> Sent: Wednesday, July 7, 2021 4:26 PM To: DCAN-Labs/dcan_bold_processing Cc: Kathy Snider; Mention Subject: [EXTERNAL] Re: [DCAN-Labs/dcan_bold_processing] concatenate() at teardown stage makes assumption on task name ending (#5)

I've made the changed and modified the "develop" branch accordingly. I created a merge request for review and assigned @kathy-sniderhttps://github.com/kathy-snider and @perroneahttps://github.com/perronea, but really if anyone can test it that should be fine.

Once the review is approved it should automatically merge the change into master, resolving it.

@xhaoNYhttps://github.com/xhaoNY feel free to try the develop branch and see if it resolves your issue. If it does, we can pull it into main and close off the ticket.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/DCAN-Labs/dcan_bold_processing/issues/5#issuecomment-875999778, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJAQP4GWKAQEAQMMBLNT2N3TWTPCNANCNFSM4TI5F4IQ.

arueter1 commented 3 years ago

@xhaoNY Have you had a chance to try the dev branch?

kathy-snider commented 3 years ago

I tested this for regression (i.e., make sure we don't break anything that already works). This fix breaks the names of files that are the concatenation of all runs. For example, a file that was called: files/MNINonLinear/Results/task-rest_DCANBOLDProc_v4.0.0_Atlas.dtseries.nii becomes: files/MNINonLinear/Results/task-rest_run-_DCANBOLDProc_v4.0.0_Atlas.dtseries.nii NOTE: since both pipelines that use BIDS names and those that do not (ABCD), this has to handle both cases. For example, names with "task-rest_run-001" and names with "task-REST01". This has always been messed up and is complicated, so testing needs to be thorough. Also, if any names will change as a result of this fix, both the custom-clean and file-mapper json files will need to be fixed to handle the old and new names (by putting in both formats).

madisoth commented 3 years ago

I think I see what's causing the issue with filenames of concatenated timeseries, it's that the regex needs to also be implemented in the 'parcellate' function of dcan_bold_proc.py. I didn't test, but I suspect this regression actually originated in v4.0.5 and that the parcellated concatenated output has been broken since..

I'll put a fix in the develop branch and test it out with abcd-hcp-pipeline this week.

arueter1 commented 3 years ago

@xhaoNY Just wanted to check in on this issue. Have you found any workarounds with your team?

madisoth commented 11 months ago

Resolved as of v4.0.8; tagged releases abcd-hcp-pipeline, nhp-abcd-bids-pipeline and infant-abcd-bids-pipeline releases from Aug 2021 and later will have the updated run name handling.

DCAN-Labs / dcan_bold_processing

concatenate() at teardown stage makes assumption on task name ending #5