dice-group / dice-embeddings

Hardware-agnostic Framework for Large-scale Knowledge Graph Embeddings
MIT License
50 stars 14 forks source link

Using tqdm in the cpu train setup #256

Closed Demirrr closed 3 months ago

Demirrr commented 3 months ago

If --trainer torchCPUTrainer, https://github.com/dice-group/dice-embeddings/blob/0849d2c46ea4bf06d07db6f3d0d2ad01fc230af5/dicee/scripts/run.py#L56 then TorchTrainer is initialized to train an embedding model. https://github.com/dice-group/dice-embeddings/blob/0849d2c46ea4bf06d07db6f3d0d2ad01fc230af5/dicee/trainer/torch_trainer.py#L9

Yet, TorchTrainer does not use a progress bar. We should integrate https://github.com/tqdm/tqdm into the TorchTrainer that is being initialized. To this end, we should focus on this particular for loop (guess) https://github.com/dice-group/dice-embeddings/blob/0849d2c46ea4bf06d07db6f3d0d2ad01fc230af5/dicee/trainer/torch_trainer.py#L75

sapkotaruz11 commented 3 months ago

We need a description tag for the progress bar : Is "Training progress" a suitable description for the progress bar?

Demirrr commented 3 months ago

Please train a model with PL trainer, to see the displayed info in the progress bar. Ideally, we should use the same description tag. However, it is not a must

sapkotaruz11 commented 3 months ago

As per the PL trainer, they do not seem to be using a description tag. The way progress bar will be shown for the two trainers is also different, PL trainer shows a single persistent progress bar whereas TorchCPUTrainer shows the logs for each epoch and batch separately. PL trainer also uses the TQDM progress bar but they have customized it.

Demirrr commented 3 months ago

Alrighty. Perhaps for the time being, we can settle for two different TQDM progress bars for different trainers

sapkotaruz11 commented 3 months ago

The major issue here with TorchCPUTrainer is that the logs ( epoch no, batch no, batch loss etc) are printed after execution of each batch of each epochs, which forces the tqdm progress bar to move to the new line. As much as I have gathered, the PL Trainer doesn't print the logs, rather it just posts the epoch no and a loss item as prefix and post-fix item within the progress bar. If we want to print the batch logs after each batch then we will have the progress bar printed again after each batch logs in a new line.

Demirrr commented 3 months ago

No worries. Could you please create a feature branch from dev branch and push the changes? So that I can take a look at it :)

sapkotaruz11 commented 3 months ago

Can you check at https://github.com/dice-group/dice-embeddings/tree/tqdm-support . For now I have just added the progress bar to the loop running all the batch and epochs.

Demirrr commented 3 months ago

Completed with https://github.com/dice-group/dice-embeddings/pull/258