Closed windprak closed 1 week ago
same problem
I fixed it adding packages=setuptools.find_namespace_packages(include=["megatron.core", "megatron.core.*","megatron.training"]) it to the setup.py
But guess what the "simple" script still crashes: `rank1: File "/home/atuin/b216dc/b216dc10/software/private/conda/envs/megatron/lib/python3.10/site-packages/torch/distributed/checkpoint/default_planner.py", line 389, in create_default_global_save_plan rank1: assert item.index.fqn not in md
`
I think if you run it as
PYTHONPATH=$PYTHONPATH:./megatron torchrun --nproc-per-node 2 examples/run_simple_mcore_train_loop.py
it should work. (mentioned in QuickStard.md)
Describe the bug Module megatron.training not found in latest version of megatron_core 0.8.0rc0
To Reproduce
Expected behavior Import without errors