AIM-Harvard / foundation-cancer-image-biomarker

Code and evaluation repository for the paper
https://aim-harvard.github.io/foundation-cancer-image-biomarker/
MIT License
80 stars 10 forks source link

Adaptation through fine-tuning with error RuntimeError: Could not infer dtype of numpy.int16 #309

Open Dtdavidgit opened 3 days ago

Dtdavidgit commented 3 days ago

Checklist

❓ Question

Hi there, I was trying to finetune the pre-trained model with LUNA16 datasets, I've went through the website and the configuration yaml file, and modify the config based on the suggestion here fmcib_finetune.yaml

However, the process stopped due to an error

192 M     Trainable params
0         Non-trainable params
192 M     Total params
771.482   Total estimated model params size (MB)
168       Modules in train mode
0         Modules in eval mode
Sanity Checking: |                                                                            | 0/? [00:00<?, ?it/s][rank3]: Traceback (most recent call last):
[rank3]:   File "/home/user/.conda/envs/fmcib/bin/lighter", line 8, in <module>
[rank3]:     sys.exit(interface())
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/lighter/utils/cli.py", line 23, in interface
[rank3]:     fire.Fire(commands)
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
[rank3]:     component_trace = _Fire(component, args, parsed_flag_args, context, name)
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/fire/core.py", line 566, in _Fire
[rank3]:     component, remaining_args = _CallAndUpdateTrace(
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
[rank3]:     component = fn(*varargs, **kwargs)
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/lighter/utils/runner.py", line 70, in run_trainer_method
[rank3]:     getattr(trainer, method)(system)
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 538, in fit
[rank3]:     call._call_and_handle_interrupt(
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 46, in _call_and_handle_interrupt
[rank3]:     return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch
[rank3]:     return function(*args, **kwargs)
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 574, in _fit_impl
[rank3]:     self._run(model, ckpt_path=ckpt_path)
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 981, in _run
[rank3]:     results = self._run_stage()
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1023, in _run_stage
[rank3]:     self._run_sanity_check()
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1052, in _run_sanity_check
[rank3]:     val_loop.run()
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/pytorch_lightning/loops/utilities.py", line 178, in _decorator
[rank3]:     return loop_run(self, *args, **kwargs)
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 128, in run
[rank3]:     batch, batch_idx, dataloader_idx = next(data_fetcher)
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/pytorch_lightning/loops/fetchers.py", line 133, in __next__
[rank3]:     batch = super().__next__()
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/pytorch_lightning/loops/fetchers.py", line 60, in __next__
[rank3]:     batch = next(self.iterator)
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/pytorch_lightning/utilities/combined_loader.py", line 341, in __next__
[rank3]:     out = next(self._iterator)
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/pytorch_lightning/utilities/combined_loader.py", line 142, in __next__
[rank3]:     out = next(self.iterators[0])
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 701, in __next__
[rank3]:     data = self._next_data()
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1465, in _next_data
[rank3]:     return self._process_data(data)
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1491, in _process_data
[rank3]:     data.reraise()
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/torch/_utils.py", line 715, in reraise
[rank3]:     raise exception
[rank3]: RuntimeError: Caught RuntimeError in DataLoader worker process 0.
[rank3]: Original Traceback (most recent call last):
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/monai/transforms/transform.py", line 141, in apply_transform
[rank3]:     return _apply_transform(transform, data, unpack_items, lazy, overrides, log_stats)
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/monai/transforms/transform.py", line 98, in _apply_transform
[rank3]:     return transform(data, lazy=lazy) if isinstance(transform, LazyTrait) else transform(data)
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/monai/transforms/intensity/array.py", line 902, in __call__
[rank3]:     img = convert_to_tensor(img, track_meta=get_track_meta())
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/monai/utils/type_conversion.py", line 169, in convert_to_tensor
[rank3]:     return _convert_tensor(data, dtype=dtype, device=device)
[rank3]:   File "/home/user/.conda/envs/fmcib/lib/python3.9/site-packages/monai/utils/type_conversion.py", line 149, in _convert_tensor
[rank3]:     tensor = torch.as_tensor(tensor, **kwargs)
[rank3]: RuntimeError: Could not infer dtype of numpy.int16

I believe the error comes from here [rank3]: RuntimeError: Could not infer dtype of numpy.int16

If possible, can you please guide me how to run this through, thank you.

amabilee commented 2 days ago

This error often occurs when there's a mismatch between the expected data type and the actual data type being processed.

Have you tried checking the data types? You can use numpy.dtype to check the data type of your arrays.