Project-MONAI / model-zoo

MONAI Model Zoo that hosts models in the MONAI Bundle format.
Apache License 2.0
186 stars 68 forks source link

Update `AddChannel`, `AsChannelFirst` with `EnsureChannelFirst` #509

Closed KumoLiu closed 11 months ago

KumoLiu commented 1 year ago

Fixes https://github.com/Project-MONAI/MONAI/issues/7036. Fixes https://github.com/Project-MONAI/model-zoo/issues/517.

Description

Status

Work in progress

Please ensure all the checkboxes:

KumoLiu commented 1 year ago

Update all related bundles including AddChannel and AsChannelFirst and verify them except for "ventricular_short_axis_3label" since I didn't have suitable data to test. cc @ericspod, could you please help update this one and verify it? Thanks!

ericspod commented 1 year ago

Hi @KumoLiu how much data did you need? I can save the output for the example (256, 256) 2D image if that's enough.

KumoLiu commented 1 year ago

Hi @KumoLiu how much data did you need? I can save the output for the example (256, 256) 2D image if that's enough.

Hi @ericspod, yeah, that would be great, just one test data with the same format used in the bundle is okay. Thanks in advance!

ericspod commented 1 year ago

SC-N-2-3-0_seg.zip This is the output of the network converted to uint8 labels. This follows the notebook in the docs directory and should be enough to test the output of the network.

SC-N-2-3-0_pred.zip This is the raw output tensor from the network if you want to use this instead.

KumoLiu commented 12 months ago

I may leave "ventricular_short_axis_3label" this bundle in this PR. It still using the API before v0.6. https://github.com/Project-MONAI/model-zoo/blob/a9acb18ad12af9f1829402a806300a5a5f917371/models/ventricular_short_axis_3label/configs/train.json#L102

Not easy to update this one, when I remove the as_tensor_output, AddChannel and specify data_type in EnsureType, it throws the error below.

Update: fixed

error message opt/pytorch/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:365: operator(): block: [64,0,0], thread: [63,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed. 2023-10-11 03:38:01,405 - ignite.engine.engine.SupervisedTrainer - ERROR - Current run is terminating due to exception: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 2023-10-11 03:38:01,430 - ignite.engine.engine.SupervisedTrainer - ERROR - Engine run is terminating due to exception: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 2023-10-11 03:38:01,431 - ignite.engine.engine.SupervisedTrainer - INFO - Deleted previous saved final checkpoint: model_final_iteration=1.pt Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/ignite/engine/engine.py", line 1068, in _run_once_on_dataset_as_gen self.state.output = self._process_function(self, self.state.batch) File "/workspace/Code/MONAI/monai/engines/trainer.py", line 230, in _iteration _compute_pred_loss() File "/workspace/Code/MONAI/monai/engines/trainer.py", line 216, in _compute_pred_loss engine.state.output[Keys.LOSS] = engine.loss_function(engine.state.output[Keys.PRED], targets).mean() File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/workspace/Code/MONAI/monai/losses/dice.py", line 176, in forward intersection = torch.sum(target * input, dim=reduce_axis) File "/workspace/Code/MONAI/monai/data/meta_tensor.py", line 282, in __torch_function__ ret = super().__torch_function__(func, types, args, kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/_tensor.py", line 1296, in __torch_function__ ret = func(*args, **kwargs) RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
ericspod commented 12 months ago

I may leave "ventricular_short_axis_3label" this bundle in this PR. It still using the API before v0.6.

https://github.com/Project-MONAI/model-zoo/blob/a9acb18ad12af9f1829402a806300a5a5f917371/models/ventricular_short_axis_3label/configs/train.json#L102

Is this totally fixed now? I think the error you have is from some issue on the platform side or something with the versions of Pytorch and/or CUDA. I don't remember what the as_tensor_output argument was meant to fix, it might have been MetaTensor related.

KumoLiu commented 12 months ago

I may leave "ventricular_short_axis_3label" this bundle in this PR. It still using the API before v0.6. https://github.com/Project-MONAI/model-zoo/blob/a9acb18ad12af9f1829402a806300a5a5f917371/models/ventricular_short_axis_3label/configs/train.json#L102

Is this totally fixed now? I think the error you have is from some issue on the platform side or something with the versions of Pytorch and/or CUDA. I don't remember what the as_tensor_output argument was meant to fix, it might have been MetaTensor related.

Yes, it works now. Thanks!

wyli commented 12 months ago

/build

wyli commented 12 months ago

/build

wyli commented 12 months ago

/build

wyli commented 12 months ago

/build

wyli commented 12 months ago

/build

wyli commented 12 months ago

/build

yiheng-wang-nv commented 11 months ago

/build

yiheng-wang-nv commented 11 months ago

/build