M-3LAB / Real3D-AD

[NeurIPS 2023] Offical code for <Real3D-AD: A Dataset of Point Cloud Anomaly Detection>. A 3D point cloud anomaly detection dataset and benchmark.
99 stars 8 forks source link

About faiss-gpu #8

Closed MAOXIXI-LV closed 3 months ago

MAOXIXI-LV commented 3 months ago

When I set faiss_on_gpu to True in main_rgd3d.py, the code runs for a while and then encounters an error message as follows. It was run on a single 3090 GPU. I would like to ask if you set faiss_on_gpu to True during your experiments? Have you encountered such errors?

Faiss assertion 'err == cudaSuccess' failed in virtual void faiss::gpu::StandardGpuResourcesImpl::deallocMemory(int, void) at /project/faiss/faiss/gpu/StandardGpuResources.cpp:518; details: Failed to cudaFree pointer 0x7fd0a603da00 (error 3 initialization error) Traceback (most recent call last): File "main_reg3dad.py", line 243, in main() File "/home/l/anaconda3/envs/real3dad/lib/python3.7/site-packages/click/core.py", line 1130, in call return self.main(args, kwargs) File "/home/l/anaconda3/envs/real3dad/lib/python3.7/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/home/l/anaconda3/envs/real3dad/lib/python3.7/site-packages/click/core.py", line 1689, in invoke return _process_result(rv) File "/home/l/anaconda3/envs/real3dad/lib/python3.7/site-packages/click/core.py", line 1626, in _process_result value = ctx.invoke(self._result_callback, value, ctx.params) File "/home/l/anaconda3/envs/real3dad/lib/python3.7/site-packages/click/core.py", line 760, in invoke return __callback(*args, **kwargs) File "main_reg3dad.py", line 136, in run test_loader File "/home/l/proj/Real3D-AD-main/patchcore/patchcore.py", line 560, in predict_pmae return self._predict_dataloader_pmae(data) File "/home/l/proj/Real3D-AD-main/patchcore/patchcore.py", line 574, in _predict_dataloader_pmae _scores, _masks = self._predict_pmae(input_pointcloud) File "/home/l/proj/Real3D-AD-main/patchcore/patchcore.py", line 582, in _predict_pmae features, sample_dix = self._embed_pointmae(input_pointcloud) File "/home/l/proj/Real3D-AD-main/patchcore/patchcore.py", line 156, in _embed_pointmae reg_data = get_registration_np(point_cloud.squeeze(0).cpu().numpy(),self.basic_template) File "/home/l/proj/Real3D-AD-main/feature_extractors/ransac_position.py", line 113, in get_registration_np voxel_size) File "/home/l/proj/Real3D-AD-main/feature_extractors/ransac_position.py", line 99, in execute_global_registration ], o3d.registration.RANSACConvergenceCriteria(100000, 1000)) File "/home/l/anaconda3/envs/real3dad/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 4977) is killed by signal: Aborted.

shirowalker commented 3 months ago

When I set faiss_on_gpu to True in main_rgd3d.py, the code runs for a while and then encounters an error message as follows. It was run on a single 3090 GPU. I would like to ask if you set faiss_on_gpu to True during your experiments? Have you encountered such errors?

Faiss assertion 'err == cudaSuccess' failed in virtual void faiss::gpu::StandardGpuResourcesImpl::deallocMemory(int, void) at /project/faiss/faiss/gpu/StandardGpuResources.cpp:518; details: Failed to cudaFree pointer 0x7fd0a603da00 (error 3 initialization error) Traceback (most recent call last): File "main_reg3dad.py", line 243, in main() File "/home/l/anaconda3/envs/real3dad/lib/python3.7/site-packages/click/core.py", line 1130, in call return self.main(args, kwargs) File "/home/l/anaconda3/envs/real3dad/lib/python3.7/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/home/l/anaconda3/envs/real3dad/lib/python3.7/site-packages/click/core.py", line 1689, in invoke return _process_result(rv) File "/home/l/anaconda3/envs/real3dad/lib/python3.7/site-packages/click/core.py", line 1626, in _process_result value = ctx.invoke(self._result_callback, value, ctx.params) File "/home/l/anaconda3/envs/real3dad/lib/python3.7/site-packages/click/core.py", line 760, in invoke return __callback(*args, **kwargs) File "main_reg3dad.py", line 136, in run test_loader File "/home/l/proj/Real3D-AD-main/patchcore/patchcore.py", line 560, in predict_pmae return self._predict_dataloader_pmae(data) File "/home/l/proj/Real3D-AD-main/patchcore/patchcore.py", line 574, in _predict_dataloader_pmae _scores, _masks = self._predict_pmae(input_pointcloud) File "/home/l/proj/Real3D-AD-main/patchcore/patchcore.py", line 582, in _predict_pmae features, sample_dix = self._embed_pointmae(input_pointcloud) File "/home/l/proj/Real3D-AD-main/patchcore/patchcore.py", line 156, in _embed_pointmae reg_data = get_registration_np(point_cloud.squeeze(0).cpu().numpy(),self.basic_template) File "/home/l/proj/Real3D-AD-main/feature_extractors/ransac_position.py", line 113, in get_registration_np voxel_size) File "/home/l/proj/Real3D-AD-main/feature_extractors/ransac_position.py", line 99, in execute_global_registration ], o3d.registration.RANSACConvergenceCriteria(100000, 1000)) File "/home/l/anaconda3/envs/real3dad/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 4977) is killed by signal: Aborted.

Your code is right. This error is an occasional error and occurs more often when there are many parallel processes.

shirowalker commented 3 months ago

I do no know how to avoid it, but you can run it again and it may not occur anymore.

MAOXIXI-LV commented 3 months ago

I do no know how to avoid it, but you can run it again and it may not occur anymore.

Thanks. Got it!