EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
https://lmms-lab.framer.ai/
Other
2.11k stars 164 forks source link

[logging] The program is not responding during multi-gpu evaluation #222

Open ssmisya opened 2 months ago

ssmisya commented 2 months ago

[logging] The program is not responding during multi-gpu evaluation. The program froze and did not respond during multi-gpu evaluation.

After checking the code, I've noticed that multi-gpu evaluation will create a "rank0_metric_eval_done.txt" under the result output-dir. (from [New Updates] LLaVA OneVision Release; MVBench, InternVL2, IXC2.5 Interleave-Bench integration. [#182]). However, due to the gap among the logging dir creation (process 0 is often slower). There are 2 folders created : image The logging procedure is done in 2 different minutes. So, the "txt" files are saved in different dirs. Subsequently, the program will run into a while loop to check whether the "txt" from all GPUs already exist, which will cause a dead loop.

FIX:my way is simple, I just cancelled the minutes display during the creation of logging dir: (lmms-eval/lmms_eval/utils.py) image Maybe there exist more appropriate methods.

Luodian commented 2 months ago

that's interesting, thanks for your PR, let me see if there's better to improve.