Closed zhoumumu closed 3 years ago
Hello,
Did you modify the code to allow for multi-gpu training? If so, can you provide the modified code?
From a quick search, this issue seem to be related to Pytorch/your overall configuration, rather than the code hosted in this repository (https://github.com/pytorch/pytorch/issues/33661).
Antonino
Hi, I've changed to another environment, with python3, torch1.5 &. cuda10.1. And I do get rid of the "double free".
By the way, just a little note here, you'd better modify the dataset.py for better IO efficiency if you want to run it in DataParallel mode. Specifically, preload all features in memory rather than fetch them during getitem(). Or you might run into the situation of 0% gpu-util.
Really appreciate your reply.
Hi, I’m glad you solved the issue and thank you for your suggestion about pre-loading features.
The reason why the current implementation does not do that, is to minimize the amount of RAM needed to run the training. Anyway, it would make sense to add a flag so that people can choose which of the two schemes to use and I believe pre-loading features can be a significant speed-up when features are stored in slow disks.
If you end up modifying the code that way, feel free to send a pull request!
Best, Antonino
Hi, I've sent the pull request.
And I got a new problem about reproducing result of one of the team in the anticipate challenge this year. It's the class-balanced loss and DRW from 2nd place's team. I've read the challege report, I think it's really easy to implement and i do get promotion based on simplest LSTM model. However I find it not work based on RULSTM. I was thinking is that I made something wrong? So is that convenient for you to give me their email? I'd like to ask for their advice and check the code details. And that would help a lot.
Best, Zhoumumu
------------------ 原始邮件 ------------------ 发件人: "fpv-iplab/rulstm" @.>; 发送时间: 2021年8月26日(星期四) 下午3:38 @.>; @.>;"State @.>; 主题: Re: [fpv-iplab/rulstm] Can't apply DataParallel (#15)
Hi, I’m glad you solved the issue and thank you for your suggestion about pre-loading features.
The reason why the current implementation does not do that, is to minimize the amount of RAM needed to run the training. Anyway, it would make sense to add a flag so that people can choose which of the two schemes to use and I believe pre-loading features can be a significant speed-up when features are stored in slow disks.
If you end up modifying the code that way, feel free to send a pull request!
Best, Antonino
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
Many thanks for the pull request! It might take some time for me to review it and I'll get back in case I have any doubts.
Regarding the contacts of the 2nd place winners of this year's EK challenge, maybe you refer to the method described in the technical report (https://epic-kitchens.github.io/Reports/EPIC-KITCHENS-Challenges-2021-Report.pdf) at page 37? In that case, you can find all the email addresses in the report.
If you end-up being able to replicate the results (or even just to improve the current ones), feel free to get back in touch or create another pull request, so we can update the code for the benefit of others - this is very much appreciated! Only keep in mind that it is best if we add new functionalities as "opt-in" parameters/flags, so that, by default, the standard (legacy) behavior is preserved.
Thanks again for your interest and support!
Oh I noticed it! Such a newbie I am about the email!
Thank you for you replies! And I'll keep in touch if i make the progress.
------------------ 原始邮件 ------------------ 发件人: "fpv-iplab/rulstm" @.>; 发送时间: 2021年8月30日(星期一) 下午2:42 @.>; @.>;"State @.>; 主题: Re: [fpv-iplab/rulstm] Can't apply DataParallel (#15)
Many thanks for the pull request! It might take some time for me to review it and I'll get back in case I have any doubts.
Regarding the contacts of the 2nd place winners of this year's EK challenge, maybe you refer to the method described in the technical report (https://epic-kitchens.github.io/Reports/EPIC-KITCHENS-Challenges-2021-Report.pdf) at page 37? In that case, you can find all the email addresses in the report.
If you end-up being able to replicate the results (or even just to improve the current ones), feel free to get back in touch or create another pull request, so we can update the code for the benefit of others - this is very much appreciated! Only keep in mind that it is best if we add new functionalities as "opt-in" parameters/flags, so that, by default, the standard (legacy) behavior is preserved.
Thanks again for your interest and support!
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
It would run into "double free" bug like below if I train in parallel mode.
Training in one GPU is kind of slow, how does the bug come?