Open kroggen opened 2 weeks ago
I'm very sorry, I haven't tried inference with this code, I've only ensured that eval works without issues. You're welcome to implement this interface on your end and integrate it into our code base by PR. :)
@kroggen I got the same error while trying eval, I did find a proxy for that specific function in the official NVIDIA repo on Megatron-LM, here. Although after getting a proxy for that func I again got into a similar error and this time, I wasn't able to find a proxy. Traceback:
Traceback (most recent call last):
File "eval.py", line 77, in <module>
main()
File "eval.py", line 35, in main
model, neox_args = setup_for_inference_or_eval(
File "/home/steadysurfdom/Research/TokenFormer/megatron/utils.py", line 455, in setup_for_inference_or_eval
initialize_megatron(neox_args)
File "/home/steadysurfdom/Research/TokenFormer/megatron/initialize.py", line 107, in initialize_megatron
from megatron.data.data_utils import compile_helper
ModuleNotFoundError: No module named 'megatron.data'
[2024-11-10 12:28:49,719] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 85925
Not so useful but eh, worth adding in the conversation I guess.
Maybe @Haiyang-W can fix the code for eval first, as this is required for anyone interested in checking the results from the paper
My intention was to compare the output of my minimal implementation with the main model, to check if it was done correctly. The output is not so good. Maybe is it because it was trained with few tokens? Because there are models with 1.5B size that are pretty good.
Anyway, it would be good to have at least a working method for evaluation
@kroggen I got the same error while trying eval, I did find a proxy for that specific function in the official NVIDIA repo on Megatron-LM, here. Although after getting a proxy for that func I again got into a similar error and this time, I wasn't able to find a proxy. Traceback:
Traceback (most recent call last): File "eval.py", line 77, in <module> main() File "eval.py", line 35, in main model, neox_args = setup_for_inference_or_eval( File "/home/steadysurfdom/Research/TokenFormer/megatron/utils.py", line 455, in setup_for_inference_or_eval initialize_megatron(neox_args) File "/home/steadysurfdom/Research/TokenFormer/megatron/initialize.py", line 107, in initialize_megatron from megatron.data.data_utils import compile_helper ModuleNotFoundError: No module named 'megatron.data' [2024-11-10 12:28:49,719] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 85925
Not so useful but eh, worth adding in the conversation I guess.
Sorry for that, due to the .gitignore, I miss the data dir in megatron. Now it's fixed. If you have any questions, feel free to tell me.
If anyone encounters problems with eval, please raise an issue in time and I will reply as soon as possible (within one day).
Maybe @Haiyang-W can fix the code for eval first, as this is required for anyone interested in checking the results from the paper
My intention was to compare the output of my minimal implementation with the main model, to check if it was done correctly. The output is not so good. Maybe is it because it was trained with few tokens? Because there are models with 1.5B size that are pretty good.
Anyway, it would be good to have at least a working method for evaluation
Can you check it again? I have uploaded the data dir. Very sorry for missing that part due to .gitignore file. :)
Yes, the eval worked now. Thanks
This is the result for the 150M model:
'results': {'arc_challenge': {'acc,none': 0.19880546075085323,
'acc_norm,none': 0.24573378839590443,
'acc_norm_stderr,none': 0.012581033453730107,
'acc_stderr,none': 0.011662850198175543},
'arc_easy': {'acc,none': 0.476010101010101,
'acc_norm,none': 0.4187710437710438,
'acc_norm_stderr,none': 0.01012348716016781,
'acc_stderr,none': 0.010247967392742684},
'hellaswag': {'acc,none': 0.30989842660824535,
'acc_norm,none': 0.35480979884485164,
'acc_norm_stderr,none': 0.004774778180345218,
'acc_stderr,none': 0.004615063817741879},
'lambada_openai': {'acc,none': 0.4506112943916165,
'acc_stderr,none': 0.006931910914621461,
'perplexity,none': 16.382835662797227,
'perplexity_stderr,none': 0.5351178238219353},
'piqa': {'acc,none': 0.6528835690968444,
'acc_norm,none': 0.6441784548422198,
'acc_norm_stderr,none': 0.011170294934656941,
'acc_stderr,none': 0.011107104993128088},
'winogrande': {'acc,none': 0.5043409629044988,
'acc_stderr,none': 0.014051956064076903}},
And for the 1.5B model:
'results': {'arc_challenge': {'acc,none': 0.3037542662116041,
'acc_norm,none': 0.3216723549488055,
'acc_norm_stderr,none': 0.013650488084494166,
'acc_stderr,none': 0.013438909184778766},
'arc_easy': {'acc,none': 0.648989898989899,
'acc_norm,none': 0.5976430976430976,
'acc_norm_stderr,none': 0.010062244711011532,
'acc_stderr,none': 0.009793703885101042},
'hellaswag': {'acc,none': 0.45339573790081655,
'acc_norm,none': 0.5986855208125871,
'acc_norm_stderr,none': 0.004891626718097012,
'acc_stderr,none': 0.004968058944472159},
'lambada_openai': {'acc,none': 0.646613623132156,
'acc_stderr,none': 0.006659772589635509,
'perplexity,none': 5.2395373300285355,
'perplexity_stderr,none': 0.12567420744987814},
'piqa': {'acc,none': 0.7453754080522307,
'acc_norm,none': 0.7377584330794341,
'acc_norm_stderr,none': 0.010262502565172449,
'acc_stderr,none': 0.01016443223706048},
'winogrande': {'acc,none': 0.5951065509076559,
'acc_stderr,none': 0.013795927003124943}},
Thanks for your checking! Nice.
I am using this command to try inference:
But it fails with:
There is no
data
folder insidemegatron
folder. And I suspect thatbuild_train_valid_test_data_iterators
is not required for inference either...Please share working instructions for inference