amazon-science / patchcore-inspection

Apache License 2.0
691 stars 142 forks source link

When I deploy patchcore on my own datasets, it reports: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 44.75 GiB #69

Open Kissacat opened 1 year ago

Kissacat commented 1 year ago

Hello! I change the datapath to my own datasets, it reports error: File "src/run_OOD.py", line 435, in main() File "/data/home/Tiantian_Liu/anaconda3/lib/python3.8/site-packages/click/core.py", line 1130, in call return self.main(args, kwargs) File "/data/home/Tiantian_Liu/anaconda3/lib/python3.8/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/data/home/Tiantian_Liu/anaconda3/lib/python3.8/site-packages/click/core.py", line 1689, in invoke return _process_result(rv) File "/data/home/Tiantian_Liu/anaconda3/lib/python3.8/site-packages/click/core.py", line 1626, in _process_result value = ctx.invoke(self._result_callback, value, ctx.params) File "/data/home/Tiantian_Liu/anaconda3/lib/python3.8/site-packages/click/core.py", line 760, in invoke return __callback(args, **kwargs) File "src/run_OOD.py", line 97, in run PatchCore.fit(dataloaders["training"]) File "/data/home/Tiantian_Liu/OOD/src/patchcore/patchcore.py", line 153, in fit self._fill_memory_bank(training_data) File "/data/home/Tiantian_Liu/OOD/src/patchcore/patchcore.py", line 174, in _fill_memory_bank features = self.featuresampler.run(features) File "/data/home/Tiantian_Liu/OOD/src/patchcore/sampler.py", line 75, in run reduced_features = self._reduce_features(features) File "/data/home/Tiantian_Liu/OOD/src/patchcore/sampler.py", line 59, in _reduce_features features = features.to(self.device) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 44.75 GiB (GPU 0; 10.75 GiB total capacity; 267.85 MiB already allocated; 10.27 GiB free; 300.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF I am not sure what could be causing this. Any help would be appreciated

Kissacat commented 1 year ago

I have fixed it by changing the number of training size

IESSTTJP commented 2 months ago

Have you encountered the error 'MVTecDataset' object has no attribute 'transform_std' while training?