Closed jenspete closed 1 year ago
The offending code is here: https://github.com/Abe404/RootPainter3D/blob/f95f856274cbcbe5a690281d749732e2bb96dad7/trainer/im_utils.py#L252
The mechanism here is that the system will retry if it didn't find any foreground, if force_fg is also true, which happens with a certain probability early on in training to avoid sampling too many background only patches early in training.
The problem (I believe) that is causing the server to crash is that the max_retries were for some reason set to 2.
https://github.com/Abe404/RootPainter3D/commit/bedb9dc66d0c7d8cb9bfde2d0b4c2eedfd99cd6a
I've put it back up to 200. I consider it unlikely that you will have 200 randomly sampled images without foreground. If so then I can come up with a different solution.
In general I don't think this retry solution is optimal, but hopefully this is a temporary fix.
Can you please let me know if it is working OK for you now.
It seems to be training now, although the server output is a somewhat cluttered with exceptions, which might confuse. However, it is okay for me :).
Actually I agree these error messages aren't great. I will change the code so it is not an exception when foreground is not found.
Fixed so it retries without exceptions/warnings. Still in another branch though. will close this issue when it is merged into master
https://github.com/Abe404/RootPainter3D/commit/dd8362cce25e5883a95e2c34e0ef1d014eaf24c7
Training kills the server on some of my projects, because images often have only background annotated. It might be worth considering if this should be treated as severely as a load failure.