Closed dariepetcu closed 1 month ago
KeyError: "Unable to synchronously open object (object 'gt' doesn't exist) is caused since the FULL pansharpening dataset has no GT. That's why we do not calculate the reduced metrics (i.e., SAM, ERGAS). To run test_fn
, you should pass full_res
arg to True which means FULL resolution.
The CUDA OOM issue is not related to GroupNorm, and the n_groups>1
may cause some color shifts in pansharpening (in my experiment). So you can try to make the network smaller to ensure the running or try on GPU which has more memory.
Hello, I am trying to run your code on the test split of the wv3/qb/gf2 dataset, which I found on the PanCollection repository. Firstly, when using the FullData h5 file the code always runs into a KeyError for the d["gt"] in the pan_dataset.py file:
KeyError: "Unable to synchronously open object (object 'gt' doesn't exist)"
However, my main issue appears when running the reduced example h5 files, which don't run into the error above. Instead, I get a CUDA out of memory error. The program first gives the following output, then it prints the error below.
File "/home/petcu/miniconda3/envs/ddif/lib/python3.11/site-packages/torch/nn/functional.py", line 2561, in group_norm return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 80.00 MiB. GPU 0 has a total capacity of 23.64 GiB of which 79.38 MiB is free. Process 94447 has 1.05 GiB memory in use. Process 103551 has 2.92 GiB memory in use. Process 3279216 has 1012.00 MiB memory in use. Process 3294244 has 1012.00 MiB memory in use. Including non-PyTorch memory, this process has 17.53 GiB memory in use. Of the allocated memory 17.03 GiB is allocated by PyTorch, and 46.36 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
I have tried to change the group size of the groupnorm to lower values but that did not help; and setting the expandable segments to true, as the trace suggests, also did not help. The GPU I am using is an RTX 4090. The full error trace is below: