fastnlp / fitlog

fitlog是一款在深度学习训练中用于辅助用户记录日志和管理代码的工具
https://gitee.com/fastnlp/fitlog
Apache License 2.0
1.47k stars 128 forks source link

Question about git_id of fitlog. #62

Open Dingseewhole opened 1 year ago

Dingseewhole commented 1 year ago

I'm a big fan of fitlog and often recommend it to my colleagues. Thank you for creating such a great repository. I found that when viewing the log file on the webpage, there is a column called git_id, for example, _ID: log_20230221_075043 with gitid: 395daa36. I also use Github to manage the same set of code while using fitlog, I thought git_id would match the commit ID of Github repository. But I found that the digits of git_id of fitlog and commit ID of Github repository are inconsistent. For example, "git_id of fitlog = 395daa36" but "commit ID of Github repository = e9ca0d4". I'm curious, how can I use git_id of fitlog to identify the commit ID of Github repository?

ScarlettChan commented 1 year ago

您好,您的邮件已收到!

WillQvQ commented 1 year ago

I'm sorry that we were unable to reproduce your issue and cannot provide very suitable suggestions. However, we found that multiple commits may occur simultaneously on different processes in multi-card training. You may check whether your problem was caused by this situation.

If you encounter this situation, we recommend that you set the fitlog to debug mode on non-main processes by using fitlog.debug() . If your training framework is fastNLP , you can use os.environ.get(FASTNLP_GLOBAL_RANK, 0)==0 to determine if the program is running on the main process.


很抱歉,我们没能复现您的问题,无法给出非常准确的建议。但我们发现,在多卡训练的过程中多个进程上的 fitlog 可能会同时进行 commit 操作,您可以检查一下问题是否是由这种情况导致。

如果遇到了这种情况,我们建议您使用 fitlog.debug() 将非主进程上的 fitlog 设置为 debug 模式 。如果您的训练框架是 fastNLP ,可以用 os.environ.get(FASTNLP_GLOBAL_RANK, 0)==0 来判断程序是否运行在主进程上。