MIC-DKFZ / nnUNet

Apache License 2.0
5.57k stars 1.7k forks source link

Inference doesn't work on CPU #2193

Closed davidkflau closed 1 month ago

davidkflau commented 3 months ago

Hi, I tried to run the inference on the CPU but got the below error.

 File "\inference\predict_from_raw_data.py", line 494, in predict_logits_from_preprocessed_data
    prediction += self.predict_sliding_window_return_logits(data).to('cpu')
RuntimeError: Inplace update to inference tensor outside InferenceMode is not allowed.You can make a clone to get a normal tensor before doing inplace update.See https://github.com/pytorch/rfcs/pull/17 for more details.
mdeleeuw1 commented 3 months ago

Same happens to me, but only when trying to perform inference using more than 1 fold.

ykirchhoff commented 3 months ago

Hi @davidkflau, @mdeleeuw1,

CPU inference should usually work fine, but might be that some changes broke something somewhere... Let me check if I can reproduce your issue and get back to you.

Best, Yannick

abhisuri97 commented 3 months ago

Also ran into the same error. using nnunetv2≥2.2.1, torch≥2.0.0 and python ≥ 3.9

davidkflau commented 2 months ago

Hi @ykirchhoff ,

Are there any updates on the issue?

ykirchhoff commented 2 months ago

Hi @davidkflau,

yes, I actually have some updates, but not a really good solution yet. The issue arises, because nnUNet now uses inference_mode in the sliding window predictions. This, however, messes with inplace operations here if we predict on CPU (prediction on GPU is not affected as we then make a clone when transferring to CPU). You can fix that by adding .clone() in line 492 and 494 in the prediction script, but that slows down prediction when using GPU quite a bit so that is no permanent solution for nnUNet. We will check how to handle this best.

Best, Yannick

chris-rapson-formus commented 1 month ago

I think there are three relevant use cases here:

  1. inference on CPU
  2. inference on GPU, with perform_everything_on_device = True
  3. inference on GPU, with perform_everything_on_device = False

My use case is number 3. I am doing the inference for each patch on GPU, but combining them into the full image on CPU. I am working with large images, and my GPU VRAM isn't large enough to store the whole image.

Adding .clone() to the end of those lines works for use cases 1 and 3, but as you said, it's not a general solution because it slows down use case 2. Perhaps this could cover all three use cases?

            pred = self.predict_sliding_window_return_logits(data).to('cpu')
            if not self.perform_everything_on_device:
               # in place update is not possible for a tensor outside InferenceMode. Clone the tensor first.
                pred = pred.clone()

            if prediction is None:
                prediction = pred
            else:
                prediction += pred
ykirchhoff commented 1 month ago

Hi @chris-rapson-formus,

there is one last setting you missed, if perform_everything_on_device fails it will run the prediction with do_on_device=False but as far as I can see right now doesn't change the value of self.perform_everything_on_device. Feel free to open a pull request if you feel like you caught all edge cases so you actually get credit.

Best, Yannick

chris-rapson-formus commented 1 month ago

Good catch, that's one I know I've come across frequently myself. I'll have to think if there's a better way to do it. Maybe try/catch? Or check if the tensor type is InferenceMode?

ykirchhoff commented 1 month ago

Maybe just check the device of pred before transferring it to cpu, if it's cpu you need to clone if it's gpu you just transfer to cpu. Not sure if there are instances where it's already on cpu but not in InferecenMode, but that should be a minority of cases if any and wouldn't break anything.

chris-rapson-formus commented 1 month ago

You're right, the device does play into it. If the tensor is on the GPU, then to('cpu') already performs a clone, and the result is not in InferenceMode. If the tensor is on the CPU, then to('cpu') is a no-op, and the result stays in InferenceMode. We want to make sure the tensor is cloned once, and only once. The slow-down with GPU comes from copying the tensor twice.

I think the most elegant solution is to check if the tensor is in InferenceMode after the to('cpu') step, and then clone if necessary. That's the pull request I've just submitted.

It turns out, you only need to do this the first time prediction is defined. The change you originally proposed to line 494 wasn't necessary.

I've attached a script I used to help me understand what was going on. (It's a python file, but github won't let me attach it unless I change the extension to .txt.) testing_torch_inference_mode.txt

ykirchhoff commented 1 month ago

That is a very clean solution, looks good to me. Fabi will take care of the pull request, but it should be merged soonish. Thanks for taking care of this!