Closed Demirrr closed 3 months ago
We need a description tag for the progress bar : Is "Training progress" a suitable description for the progress bar?
Please train a model with PL
trainer, to see the displayed info in the progress bar. Ideally, we should use the same description tag. However, it is not a must
As per the PL trainer, they do not seem to be using a description tag. The way progress bar will be shown for the two trainers is also different, PL trainer shows a single persistent progress bar whereas TorchCPUTrainer shows the logs for each epoch and batch separately. PL trainer also uses the TQDM progress bar but they have customized it.
Alrighty. Perhaps for the time being, we can settle for two different TQDM progress bars for different trainers
The major issue here with TorchCPUTrainer is that the logs ( epoch no, batch no, batch loss etc) are printed after execution of each batch of each epochs, which forces the tqdm progress bar to move to the new line. As much as I have gathered, the PL Trainer doesn't print the logs, rather it just posts the epoch no and a loss item as prefix and post-fix item within the progress bar. If we want to print the batch logs after each batch then we will have the progress bar printed again after each batch logs in a new line.
No worries. Could you please create a feature branch from dev branch and push the changes? So that I can take a look at it :)
Can you check at https://github.com/dice-group/dice-embeddings/tree/tqdm-support . For now I have just added the progress bar to the loop running all the batch and epochs.
Completed with https://github.com/dice-group/dice-embeddings/pull/258
If
--trainer torchCPUTrainer
, https://github.com/dice-group/dice-embeddings/blob/0849d2c46ea4bf06d07db6f3d0d2ad01fc230af5/dicee/scripts/run.py#L56 then TorchTrainer is initialized to train an embedding model. https://github.com/dice-group/dice-embeddings/blob/0849d2c46ea4bf06d07db6f3d0d2ad01fc230af5/dicee/trainer/torch_trainer.py#L9Yet, TorchTrainer does not use a progress bar. We should integrate https://github.com/tqdm/tqdm into the TorchTrainer that is being initialized. To this end, we should focus on this particular for loop (guess) https://github.com/dice-group/dice-embeddings/blob/0849d2c46ea4bf06d07db6f3d0d2ad01fc230af5/dicee/trainer/torch_trainer.py#L75