Bids preproc - Githubissues

billbrod commented 6 years ago

This pull request would allow the preprocessing script to work with BIDS data, automatically grabbing most relevant information directly from the directory (using pybids). There are a couple of things to consider:

I currently don't have pybids as a requirement, because it's not necessary for the prisma data (we wrap that import in a try / except block). Should it just be a requirement?
How do we map from BIDS's i / j / k to FSL'sn x / y / z? I just assume i -> x, j-> y, and k -> z.
Most arguments (subject, etc) are now optional. Ideally, they'd be required if -dir_structure was prisma and non-existent if -dir_structure was bids, but I don't know a better way to handle this. They'll be silently ignored if the data is BIDS-structured, and an exception will get thrown (though a little farther down) if they're not specified and the data is prisma-structured
I added plugin_args, which allows the user to specify arguments to pass to the plugins. I use this to tell MultiProc not to grab all the cpus / memory when on the cluster. But the text must be structured in a very particular way, so it's a little finicky to use. Not sure if there's a better way.
prisma_to_BIDS.py has been updated but with Flywheel now able to do the conversion directly from the DICOMs, not sure we'll need it. The json_check function might still be helpful, as it agglomerates jsons (i.e., if all functional jsons have the same RepetitionTime, the function will remove that key from all of them and put it in a json in the directory above, in a BIDS-compliant way)
I added the ability for users to do one run at a time (instead of a session at a time) for BIDS-structured data, but I'm pretty sure it won't work for prisma-structured data. Is that something we want?
Also right now the output names are probably not BIDS-compliant (though there's not a standard way of doing it yet, see https://docs.google.com/document/d/1Wwc4A6Mow4ZPPszDIWfCUCRNstn7d_zzaWPcfcHmgI4/edit#heading=h.mqkmyp254xh6). We could rename them to something along the lines of sub-wlsubj001_ses-01_run-01_preproc.nii.gz or something like that

sfavila commented 6 years ago

Thanks for working on this!

Given that the flywheel conversion to BIDS isn't totally ready to go yet, and most people will be using this on the prisma option, I spent some time testing that. I ran this script on several prisma datasets that were run using the old script and confirmed that the output files are identical--they were. So, I think it's fine to merge with a stipulation that the bids option is in development.
I wasn't able to get the registration to the anatomy to work using the bids option. It grabbed all the correct files but was looking for sub-wlsubjxxx in the freesurfer directory rather than wl_subjxxx even though I passed wl_subjxxx on the command line.
The i/j/k -> x/y/z issue still needs to be verified.
As we discussed, I think moving to a file that specifies the mapping from run numbers to the various scans for prisma formatted data would allow us to get rid of a bunch of command line arguments, and would make using the script more similar for prisma and bids options. If other people like this idea, it would probably be easy to implement.
The plugin args parsing looks a little fussy, but I didn't actually test it. If n_procs and memory_gb are the only ones that people will tend to use it might be easier to set them as separate optional command line arguments, especially if we get rid a bunch of args related to the session details. I'm not sure what tends to be preferred in these cases...I'll leave it up to you.
I would definitely make the output directories/file names BIDS compliant (or whatever a reasonable guess is) for the BIDS option and probably leave the output names as before for the prisma option.

billbrod commented 6 years ago

Thanks for testing! In response to your points:

It's able to save put the epi_output_nums in the json without a problem? I just realized from looking back in the code that, when dir_structure=='prisma', epi_output_nums is not converted to a list, which I thought should cause it to fail.
I'm not sure how to handle this -- when dir_structure=='bids', it will ignore subject from the command line. BIDS subject's are supposed to be structured like sub-name; I've been calling mine sub-wlsubj001, etc. In order to make this work with Freesurfer, I have a symlink to Acadia's main SUBJECTS_DIR within my BIDS directory. So, I have a folder called derivatives/freesurfer/sub-wlsubj001 that is a symlink to Acadia/Freesurfer_subjects/wl_subj001. That way, as long as I update SUBJECTS_DIR before running the pipeline, it will work. I've done that so the subject name is BIDS-approved. Because I use pybids to grab the subject name, the value that gets returned is wlsubj001, which I prepend sub- to. I see two ways we could handle this:

Since we know how Winawer lab subject names are formatted, we could check for both SUBJECTS_DIR/wl_subj001 and SUBJECTS_DIR/sub-wlsubj001 and store whichever subject name is found.
For dir_structure=='bids', use the sub-wlsubj001 name unless the user passes subject on the command line, in which case we use that.

What do you think?

Agreed, still waiting to get an example session from Flywheel to check.
Also agreed, though we should get feedback from others in the lab (and even if we do that, that doesn't need to be part of this pull request)
I wanted to make it possible for people to specify other plugin_args, if they want to use a different plugin. It is fussy, but unless people have trouble using it, I'd rather leave it as is.
The way I've been naming my outputs is like derivatives/preprocessed/sub-wlsubj001/ses-01/sub-wlsubj001_ses-01_task-sfp_run-01_preproc.nii.gz. I don't think the official naming convention is settled on, but something like that seems reasonable. The question is, how flexible do we want to make it? To me it makes sense that the preprocessed folder and the preproc suffix could be set by the user, with something like that as the default, but the derivatives folder, subject, session, task, and run stuff should all be set.

sfavila commented 6 years ago

Ack yeah, sorry, forgot about this. epi_output_nums does need to be converted to list in the prisma case. Casting to a list using list() does not work on my python 3 setup because it produces numpy.int64 types which json also won't deal with. ndarray.tolist() produces regular ints and works for me. you could probably also avoid numpy in the first place.
Without having gone through organizing and analyzing BIDs formatted data myself, I think something like option 2 makes sense. If the user provides wl_subj001 on the command line, that could be passed to the subject_id argument in fs.BBRegister(), which would then look in SUBJECTS_DIR/wl_subj001 for the anatomy and run correctly. But you could still save the BIDs style subject name in the file names, metadata, and what have you. If they don't pass anything, you could pass the BIDS style subject name to fs.BBRegister().
All the rest sounds fine. Your naming convention for the outputs looks reasonable to me, and I agree that the directory structure and information and string formatting in the filenames should be fixed. Allowing an optional override of the top folder and suffix seems good in case people need to run multiple versions or something like that.

billbrod commented 6 years ago

Okay, so I think I addressed the issues above, give it a try and let me know!

I'm also going to try and add a testing script because this is starting to become a pain to test manually.

billbrod commented 6 years ago

Okay, I'm going to go ahead and merge these. When people run into new problems, we'll open up new issues to fix them.

WinawerLab / MRI_tools

Bids preproc #3