Closed nijatmursali closed 9 months ago
check batch size in config
Hi @nijatmursali ,
Just try decreasing the batch size in .yml file device: gpu_ids: [0] workers_per_gpu: 10 batchsize_per_gpu: 4 precision: 32 # set to 16 to use AMP training
U need to change the batchsize_per_gpu as per your convenience.
Thank you, both for mentioning. I played with parameters and seems 32 batch_size is good for my system, but training takes quite a lot of time for epochs.
My system is RTX3060 6GB with 40GB ram. I had to put workers to 4 (but my system has 12, but it gives memory error).
Is there any trained model I can download (either Weight or Checkpoint) with 300 epochs? I'm trying to work on my Thesis project, but can't train the model on local.
@nijatmursali you can utilise worker_per_gpu=12. Just do a trail training with batch_size=1 and epochs=50
If that works then you can train for all 300 epochs.
Is there any trained model I can download (Checkpoint) with 300 epochs?
You can check docs of nanodet
Is there any trained model I can download (Checkpoint) with 300 epochs?
Are u aware of how to save all the checkpoints of each epochs and what epochs does model.best saves? Where to check those?
Hello,
I have my COCO and VOC datasets on my local and I installed this repository and could run the demo in my local.
I have COCO dataset like:
When I run the train file like
it gives:
Also, where does the checkpoint file goes (which folder) once I train the model?