eg4000 / SKU110K_CVPR19

771 stars 182 forks source link

How to avoid SIGALRM (failed prediction)? #38

Closed wiamadaya closed 4 years ago

wiamadaya commented 5 years ago

What is really causing SIGALRM during inference? Am I out of computer resources (cpu, memory, gpu) hence timeout? I usually increase the threshold from 0.4 or 0.5 to avoid this problem

eg4000 commented 5 years ago

In rare cases the input MoG is too complex and and may cause long computation time for one of the inference stages. It's not a matter of resources.

wiamadaya commented 4 years ago

i am confused understanding the collapse module on CollapsingMoG.py, when performing inference using windows the SIGALRM error triggered, then i try on linux but none of the following messages in collapse module are printed, and the detected object is equal to 1

error on windows, i understand that no SIGALRM on windows

~\SKU110K_CVPR19\object_detector_retinanet\keras_retinanet\utils\CollapsingMoG.py in collapse(original_detection_centers, k, offset, max_iter, epsilon) 95 def collapse(original_detection_centers, k, offset, max_iter=100, epsilon=1e-100): 96 try: ---> 97 with Timeout(3): 98 n = original_detection_centers.shape[0] 99 mu_x = original_detection_centers.x - offset[0]

~\SKU110K_CVPR19\object_detector_retinanet\keras_retinanet\utils\CollapsingMoG.py in __enter__(self) 16 17 def __enter__(self): ---> 18 signal.signal(signal.SIGALRM, self.raise_timeout) 19 signal.setitimer(signal.ITIMER_REAL, self.sec) 20

eg4000 commented 4 years ago

I'm not sure I understand your question. Can you provide more details: what network and image are you running and what and what is the result you get?

wiamadaya commented 4 years ago

sorry for the lack of clarity, what i mean is when doing an inference on an image on rare cases i encounter SIGALRMerror on windows, but now i have linux machine and performing inference on the same image - which i encountered the SIGALRMerror on windows - could run the collapse function without error, but i am still confused how this collapsefunction works

eg4000 commented 4 years ago

Can you debug the code to see exactly what happens? There are several possible fallbacks to the collapse function, depending on where it fails/gets stuck.

Also, just a guess, see if the results make more sense if you change the end of the collapse function to:

    except Timeout.Timeout:
        print ("EM Timeout - using fallback")
        return beta_init, mu_prime_init, covariance_prime_init
    return beta, mu_prime, covariance_prime