facebookresearch / fastMRI

A large-scale dataset of both raw MRI measurements and clinical MRI images.
https://fastmri.org
MIT License
1.31k stars 373 forks source link

Packageify fastMRI #50

Closed mmuckley closed 3 years ago

mmuckley commented 4 years ago

This post is intended to both introduce the issue and be modified to keep track of progress.

The fastMRI repository was originally designed for wholly self-contained experiments. This was good for readability as it kept all experimental parameters, models, and training scripts in a single file. At the time, the only deep learning model in the repository was a U-Net, and so having modules or reusable components wasn't as much of a priority.

Since then we've introduced a number of new components to the repository, including the end-to-end variational network and new mask sampling. We've also added PyTorch Lightning. As a result, modular components could be more useful now, but we've generally kept the old structure. Moving towards a more modular structure would help compartmentalize things a bit more, helping users mix modules and samplers into their projects. We could also clearly separate experimental parameters that we've run for leaderboard submissions with exact hyperparameters and keep these experiments frozen in time.

Toward this I'd like to propose to "packageify" fastMRI. This will largely be a code refactor with a few new areas. Essentially, I'm thinking of the following structure:

.
├── fastmri
│   ├── data
│   │   ├── dataset.py
│   │   ├── subsample.py
│   │   ├── transforms.py
│   │   └── volume_sampler.py
│   ├── models
│   │   ├── unet.py
│   │   └── varnet.py
│   ├── evaluate.py
│   ├── recon.py
│   └── mri_module.py
├── experimental
│   ├── cs
│   │   └── run_bart.py
│   ├── unet
│   │   ├── train_unet_demo.py
│   │   ├── unet_brain_challenge_submission_YYYY-MM-DD.py
│   │   └── unet_module.py
│   ├── varnet
│   │   ├── train_varnet_demo.py
│   │   ├── varnet_brain_challenge_submission_YYYY-MM-DD.py
│   │   └── varnet_module.py
│   ├── zero_filled
│   │   └── run_zero_filled.py

The files in experimental would import training_module, update it with their own model and training parameters, and these would be committed to the GitHub to make reproducibility more obvious.

Task list for completing this issue:

Making these changes should not require modifications to existing working code. Once they're done we can decide about deprecating and removing old folders.

Bala93 commented 4 years ago

Very good proposal. This will result in a common train, valid, evaluate code for any new model and datasets. Is this open for contributions?

mmuckley commented 4 years ago

This is an open source repository - of course! If you have some area that you'd like to prioritize perhaps post it here so that we don't duplicate efforts.

adefazio commented 4 years ago

We can move the banding removal code under experimental perhaps? I am hesitant to try to combine it with the rest of the code-base as even a minor change may break reproducibility.

mmuckley commented 4 years ago

That seems reasonable to me. It would be nice to get a basic implementation into fastmri at first, and for reproducibility we won't deprecate the current folder until we have enough time to verify everything vs. the paper. It's a big one compared to the others.

Bala93 commented 4 years ago

If you have some area that you'd like to prioritize perhaps post it here so that we don't duplicate efforts.

Actually, I am using the older version of fastMRI code, the one without pytorch lightning. I even changed the varnet code to the older style. Now I am trying to bring different models using that code base.

You continue with pytorch lightning, I will share the code base with the old code style once I have it ready for atleast 2 or 3 models.

Thank you.

mmuckley commented 4 years ago

At this point the core components of the refactor have been finished from the former code and new models have been updated to the leaderboard. The codes for generating the leaderboard models are now in experimental/varnet/varnet_brain_leaderboard_submission_2020-08-21.py and experimental/unet/unet_brain_leaderboard_submission_2020-08-18.py.

It's still undecided on PyPI distribution - might need to discuss around about this. It's also undecided whether we might want to move the PyTorch Lightning training modules in to the main package folder for distribution or leave them in experimental.