Open mahuahuahua opened 3 weeks ago
Hi, it takes about a week or two on a single RTX 2080 Ti. CPU is not recommended for this step since it can be very slow to train Transformer type of neural networks. You’d better use some high end GPU for this.
On Sun, Oct 27, 2024 at 23:24 mahuahuahua @.***> wrote:
Hello, I see that your device uses GPU acceleration (on a single RTX 2080 Ti), while I am currently using an AMD EPYC 7742 (CPU). I have tried to replicate your code, and I have completed the first step “python preprocess_illike.py,” which took about 10 hours. Now I am in the second step “python pre-train_IL_Transformer.py,” and it has been four days, yet the first epoch is still not completed. I would like to ask how long it took you to complete this step.
2024-10-24 15:24:07,486 - root - Use beam_size=4, alpha=0.6, K=5 2024-10-24 15:27:04,015 - root - [Epoch 0 Batch 100/365892] loss=2.1191, ppl=8.3238, gnorm=1.5763, throughput=13.80K wps, wc=2436.87K .................................................................... 2024-10-28 11:16:47,949 - root - [Epoch 0 Batch 199900/365892] loss=0.0359, ppl=1.0366, gnorm=0.2563, throughput=15.72K wps, wc=2647.59K 2024-10-28 11:19:31,867 - root - [Epoch 0 Batch 200000/365892] loss=0.0362, ppl=1.0369, gnorm=0.2723, throughput=15.83K wps, wc=2595.47K
— Reply to this email directly, view it on GitHub https://github.com/GuzhongChen/ILTransR/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3I4YT4ELPWF64BD5ULJTOLZ5WU7FAVCNFSM6AAAAABQWPVVJSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGYYTOMJZGI4TONI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Okay. Thank you.
Hello, I switched to a new GPU for training, which took about two and a half days to complete. I am currently running a Jupyter notebook, but I'm encountering some version compatibility issues. I'm using NumPy version 1.19.5, but I'm not sure about the version of Pandas. Could you provide me with some specific version information? Thank you!
Hi, my environment uses numpy 1.22.4 and pandas 1.4.2.
mahuahuahua @.***> 于2024年11月4日周一 02:11写道:
Hello, I switched to a new GPU for training, which took about two and a half days to complete. I am currently running a Jupyter notebook, but I'm encountering some version compatibility issues. I'm using NumPy version 1.19.5, but I'm not sure about the version of Pandas. Could you provide me with some specific version information? Thank you!
— Reply to this email directly, view it on GitHub https://github.com/GuzhongChen/ILTransR/issues/1#issuecomment-2453958404, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3I4YT7ARWT23XQW2JR2ODTZ64MZTAVCNFSM6AAAAABQWPVVJSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJTHE2TQNBQGQ . You are receiving this because you commented.Message ID: @.***>
Hi, my environment uses numpy 1.22.4 and pandas 1.4.2. mahuahuahua @.> 于2024年11月4日周一 02:11写道: … Hello, I switched to a new GPU for training, which took about two and a half days to complete. I am currently running a Jupyter notebook, but I'm encountering some version compatibility issues. I'm using NumPy version 1.19.5, but I'm not sure about the version of Pandas. Could you provide me with some specific version information? Thank you! — Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3I4YT7ARWT23XQW2JR2ODTZ64MZTAVCNFSM6AAAAABQWPVVJSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJTHE2TQNBQGQ . You are receiving this because you commented.Message ID: @.>
Thanks to your reply I managed to solve this problem
Hello, I see that your device uses GPU acceleration (on a single RTX 2080 Ti), while I am currently using an AMD EPYC 7742 (CPU). I have tried to replicate your code, and I have completed the first step “python preprocess_illike.py,” which took about 10 hours. Now I am in the second step “python pre-train_IL_Transformer.py,” and it has been four days, yet the first epoch is still not completed. I would like to ask how long it took you to complete this step.
2024-10-24 15:24:07,486 - root - Use beam_size=4, alpha=0.6, K=5 2024-10-24 15:27:04,015 - root - [Epoch 0 Batch 100/365892] loss=2.1191, ppl=8.3238, gnorm=1.5763, throughput=13.80K wps, wc=2436.87K ....................................................................
2024-10-28 11:16:47,949 - root - [Epoch 0 Batch 199900/365892] loss=0.0359, ppl=1.0366, gnorm=0.2563, throughput=15.72K wps, wc=2647.59K 2024-10-28 11:19:31,867 - root - [Epoch 0 Batch 200000/365892] loss=0.0362, ppl=1.0369, gnorm=0.2723, throughput=15.83K wps, wc=2595.47K