QuyAnh2005 / neurips-llm-challenge

A winner of NeurIPS LLM 2023 Competition
https://github.com/knovel-eng/neurips-llm-2023
MIT License
0 stars 1 forks source link

why checkpoint-400? #2

Closed weiweiy closed 9 months ago

weiweiy commented 9 months ago

Question? Why re you uploading the model at check-point-400? https://github.com/QuyAnh2005/neurips-llm-challenge/blob/main/finetune-code/4090/train.py#L130 even though your max_steps are set to 450 https://github.com/QuyAnh2005/neurips-llm-challenge/blob/main/finetune-code/4090/train.py#L108

and the model did train for 450 step?

weiweiy commented 9 months ago

Same issue for A100. Let me know if you want me to use checkpoint_450 or keep 400 for final eval

QuyAnh2005 commented 9 months ago

Right, the model is trained for 450 steps. In the submission open time, I don't have full condition to run evaluation helm on local. I rented GPU on runpod.io and I often set max_steps is 500 (more detail at https://github.com/QuyAnh2005/neurips-llm-challenge/tree/main/notebooks/finetune). However, when going to step 400, I uploaded it to huggingface repo and evaluate manually a few examples and submitted. So, The main reasons for your question:

Sorry about the confusion @weiweiy

QuyAnh2005 commented 9 months ago

Same issue for A100. Let me know if you want me to use checkpoint_450 or keep 400 for final eval

A100 - checkpoint 400 4090 - checkpoint 450

Is it okay? Because