Closed xhaoNY closed 11 months ago
@xhaoNY This is a good catch.
@perronea I think we need a more robust regular expression put in place for the taskname and run number parsing. Can you please add this to your queue?
There seems to be an issue with the regex as of v4.0.6; the first capture group captures everything starting at "task-" and ending immediately before the last digit before an underscore; the second capture group captures only the last digit.
In short, if the run labels are zero-padded in front ("run-01", "run-02", "run-03") the leading zero is included in the first capture group and becomes part of taskname . If there are >= 10 runs, runs 10-19 have an extra "1" attached to taskname, runs 20-29 have an extra "2" attached, etc.
During teardown this also means runs 1-9 are being concatenated to task-\<task-label>0DCANBOLDProc\<ver>.dtseries.nii, runs 10-19 to task-\<task-label>1DCANBOLDProc\<ver>.dtseries.nii, runs 20-29 to task-\<task-label>2DCANBOLDProc\<ver>.dtseries.nii, etc...
The relevant snippet of dcan_bold_proc.py is
expr = re.compile(r'.*(task-[^_]+).*([0-9]+).*')
taskname = expr.match(task).group(1)
Example of current regex behavior:
If the regex is changed so the second capture group matches exactly two digits, I think the problem would be fixed? (Admittedly it's messy to rely on the assumption run indices will be always two-digits and zero-padded, but we were assuming as much prior to this regex...) Example:
the following regex might be more a general solution .*(task-[^0-9]+)([0-9]*[^_]).*
it won't work if numerics are part of the task name though
The benefit here is that the solution will work for any combination of letters and numbers under BIDS compatible formats -- since the first part is just looking for numbers to terminate.
In fact, it will work for some violations of BIDS compatible formats as well:
However, here's what happens if numerics are part of the task name:
I've made the changed and modified the "develop" branch accordingly. I created a merge request for review and assigned @kathy-snider and @perronea, but really if anyone can test it that should be fine.
Once the review is approved it should automatically merge the change into master, resolving it.
@xhaoNY feel free to try the develop branch and see if it resolves your issue. If it does, we can pull it into main and close off the ticket.
Yo! I "approved" the changes before I read these emails Testing is part of making the change, so if this has not been testing, I would suggest doing that as soon as possible.
From: ericfeczko @.***> Sent: Wednesday, July 7, 2021 4:26 PM To: DCAN-Labs/dcan_bold_processing Cc: Kathy Snider; Mention Subject: [EXTERNAL] Re: [DCAN-Labs/dcan_bold_processing] concatenate() at teardown stage makes assumption on task name ending (#5)
I've made the changed and modified the "develop" branch accordingly. I created a merge request for review and assigned @kathy-sniderhttps://github.com/kathy-snider and @perroneahttps://github.com/perronea, but really if anyone can test it that should be fine.
Once the review is approved it should automatically merge the change into master, resolving it.
@xhaoNYhttps://github.com/xhaoNY feel free to try the develop branch and see if it resolves your issue. If it does, we can pull it into main and close off the ticket.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/DCAN-Labs/dcan_bold_processing/issues/5#issuecomment-875999778, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJAQP4GWKAQEAQMMBLNT2N3TWTPCNANCNFSM4TI5F4IQ.
@xhaoNY Have you had a chance to try the dev branch?
I tested this for regression (i.e., make sure we don't break anything that already works). This fix breaks the names of files that are the concatenation of all runs. For example, a file that was called: files/MNINonLinear/Results/task-rest_DCANBOLDProc_v4.0.0_Atlas.dtseries.nii becomes: files/MNINonLinear/Results/task-rest_run-_DCANBOLDProc_v4.0.0_Atlas.dtseries.nii NOTE: since both pipelines that use BIDS names and those that do not (ABCD), this has to handle both cases. For example, names with "task-rest_run-001" and names with "task-REST01". This has always been messed up and is complicated, so testing needs to be thorough. Also, if any names will change as a result of this fix, both the custom-clean and file-mapper json files will need to be fixed to handle the old and new names (by putting in both formats).
I think I see what's causing the issue with filenames of concatenated timeseries, it's that the regex needs to also be implemented in the 'parcellate' function of dcan_bold_proc.py. I didn't test, but I suspect this regression actually originated in v4.0.5 and that the parcellated concatenated output has been broken since..
I'll put a fix in the develop branch and test it out with abcd-hcp-pipeline this week.
@xhaoNY Just wanted to check in on this issue. Have you found any workarounds with your team?
Resolved as of v4.0.8; tagged releases abcd-hcp-pipeline, nhp-abcd-bids-pipeline and infant-abcd-bids-pipeline releases from Aug 2021 and later will have the updated run name handling.
The concatenate(concatlist, output_folder) function called at teardown stage assumes the last two characters of taskname representing the run number such as 01, 02... However, outputs from fmriprep add "_desc_preproc" to the end of the original taskname and thus make all the tasks having the exact same ending of "oc". This causes problem and teardown cannot finish successfully.