Hello,
I started training on 2xA100 SMX4 according to your tutorial. I am using pokemon.yaml file. My dataset contains 1743 images and I am loading it via huggingface. The training has been going on for 13 hours and the first epoch isn't even over yet. There are neither images produced from validation texts nor a saved checkpoint in the log folder. It says your training takes 6 hours with the 2xA6000, wouldn't you expect a similar performance from the A100?
Hello, I started training on 2xA100 SMX4 according to your tutorial. I am using pokemon.yaml file. My dataset contains 1743 images and I am loading it via huggingface. The training has been going on for 13 hours and the first epoch isn't even over yet. There are neither images produced from validation texts nor a saved checkpoint in the log folder. It says your training takes 6 hours with the 2xA6000, wouldn't you expect a similar performance from the A100?