facebookresearch / fastMRI

A large-scale dataset of both raw MRI measurements and clinical MRI images.
https://fastmri.org
MIT License
1.28k stars 370 forks source link

Error related to raw_sample_filter in _create_data_loader #271

Open mmuckley opened 1 year ago

mmuckley commented 1 year ago

Discussed in https://github.com/facebookresearch/fastMRI/discussions/263

Creating an issue with this - seems like some aspects of sample filtering are bugged with recent changes.

Originally posted by **mouryarahul** August 24, 2022 Hi, I'm trying to run `python train_unet_demo.py \` `--mode test \` `--test_split test \` `--challenge singlecoil \` `--data_path ../../../FastMRI_DATASET/knee_singlecoil_train/ \` `--resume_from_checkpoint unet/unet_demo/checkpoints/epoch=1-step=69484.ckpt` where `../../../FastMRI_DATASET/knee_singlecoil_train/ ` contains all three folders: `singlecoil_test`, `singlecoil_train` and `singlecoil_val` However, I'm getting an error related to `raw_sample_filter` in the case of the test dataset. Maybe I am missing something or doing something silly. Can someone please point out the mistake? Thanks! **Info about my environment:** PyTorch version: 1.12.0+cu116 Is debug build: False CUDA used to build PyTorch: 11.6 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04 LTS (x86_64) GCC version: (Ubuntu 11.2.0-19ubuntu1) 11.2.0 Clang version: Could not collect CMake version: version 3.22.1 Libc version: glibc-2.35 Python version: 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.15.0-46-generic-x86_64-with-glibc2.35 Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1070 Nvidia driver version: 515.65.01 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Versions of relevant libraries: [pip3] numpy==1.22.3 [pip3] pytorch-lightning==1.7.2 [pip3] torch==1.12.0+cu116 [pip3] torchaudio==0.12.0+cu116 [pip3] torchmetrics==0.9.2 [pip3] torchvision==0.13.0+cu116 [conda] blas 1.0 mkl [conda] mkl 2021.4.0 h06a4308_640 [conda] mkl-service 2.4.0 py310h7f8727e_0 [conda] mkl_fft 1.3.1 py310hd6ae3a3_0 [conda] mkl_random 1.2.2 py310h00e6091_0 [conda] numpy 1.22.3 py310hfa59a62_0 [conda] numpy-base 1.22.3 py310h9585f30_0 [conda] pytorch-lightning 1.7.2 pypi_0 pypi [conda] torch 1.12.0+cu116 pypi_0 pypi [conda] torchaudio 0.12.0+cu116 pypi_0 pypi [conda] torchmetrics 0.9.2 pypi_0 pypi [conda] torchvision 0.13.0+cu116 pypi_0 pypi **The full error msg:** Global seed set to 42 /home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Torchmetrics v0.9 introduced a new argument class property called `full_state_update` that has not been set for this class (DistributedMetricSum). The property determines if `update` by default needs access to the full metric state. If this is not the case, significant speedups can be achieved and we recommend setting this to `False`. We provide an checking function `from torchmetrics.utilities import check_forward_no_full_state` that can be used to check if the `full_state_update=True` (old and potential slower behaviour, default for now) or if `full_state_update=False` can be used safely. warnings.warn(*args, **kwargs) /home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:446: LightningDeprecationWarning: Setting `Trainer(gpus=1)` is deprecated in v1.7 and will be removed in v2.0. Please use `Trainer(accelerator='gpu', devices=1)` instead. rank_zero_deprecation( /home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py:52: LightningDeprecationWarning: Setting `Trainer(resume_from_checkpoint=)` is deprecated in v1.5 and will be removed in v1.7. Please pass `Trainer.fit(ckpt_path=)` directly instead. rank_zero_deprecation( GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs Global seed set to 42 Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1 ---------------------------------------------------------------------------------------------------- distributed_backend=nccl All distributed processes registered. Starting with 1 processes ---------------------------------------------------------------------------------------------------- LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] Traceback (most recent call last): File "/media/rahul/DATA/WorkSpace/Multimodal-Data-Processing/Projects/fastMRI/fastmri_examples/unet/train_unet_demo.py", line 191, in run_cli() File "/media/rahul/DATA/WorkSpace/Multimodal-Data-Processing/Projects/fastMRI/fastmri_examples/unet/train_unet_demo.py", line 187, in run_cli cli_main(args) File "/media/rahul/DATA/WorkSpace/Multimodal-Data-Processing/Projects/fastMRI/fastmri_examples/unet/train_unet_demo.py", line 75, in cli_main trainer.test(model, datamodule=data_module) File "/home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 864, in test return self._call_and_handle_interrupt(self._test_impl, model, dataloaders, ckpt_path, verbose, datamodule) File "/home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 648, in _call_and_handle_interrupt return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs) File "/home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch return function(*args, **kwargs) File "/home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 911, in _test_impl results = self._run(model, ckpt_path=self.ckpt_path) File "/home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1168, in _run results = self._run_stage() File "/home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1251, in _run_stage return self._run_evaluate() File "/home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1291, in _run_evaluate self._evaluation_loop._reload_evaluation_dataloaders() File "/home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 234, in _reload_evaluation_dataloaders self.trainer.reset_test_dataloader() File "/home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1944, in reset_test_dataloader self.num_test_batches, self.test_dataloaders = self._data_connector._reset_eval_dataloader( File "/home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py", line 348, in _reset_eval_dataloader dataloaders = self._request_dataloader(mode) File "/home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py", line 436, in _request_dataloader dataloader = source.dataloader() File "/home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py", line 513, in dataloader return method() File "/media/rahul/DATA/WorkSpace/Multimodal-Data-Processing/Projects/fastMRI/fastmri/pl_modules/data_module.py", line 325, in test_dataloader return self._create_data_loader( File "/media/rahul/DATA/WorkSpace/Multimodal-Data-Processing/Projects/fastMRI/fastmri/pl_modules/data_module.py", line 262, in _create_data_loader raw_sample_filter=raw_sample_filter, UnboundLocalError: local variable 'raw_sample_filter' referenced before assignment