Open yfeizhang opened 1 year ago
Hi @yfeizhang , May I ask if you were able to run the training script? And if you were able to, what does your environment look like?
I am able to run it and the environment is the same as the docker file the repo provides.
When I use the docker file provided, I am getting a lot of dependencies issues. Is it possible for you to share with us your environment requirements file, if I would like to replicate it? Also, please mention if you have changed any packages.
I only change one package version in docker file but it is long time for such issue and now I am afraid I cannot recover all information. However, the docker file is fine except the case that only one package version is not appropriate.
From: Abhishek Tyagi @.> Sent: Thursday, January 18, 2024 5:51 AM To: HazyResearch/fly @.> Cc: YIFEI ZHANG @.>; Mention @.> Subject: Re: [HazyResearch/fly] Error when running wiki103 gpt2-m and gpt2-l baseline pretraining experiments (Issue #14)
When I use the docker file provided, I am getting a lot of dependencies issues. Is it possible for you to share with us your environment requirements file, if I would like to replicate it? Also, please mention if you have changed any packages.
— Reply to this email directly, view it on GitHubhttps://github.com/HazyResearch/fly/issues/14#issuecomment-1898518880, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AUN6SRGPHTCEIDGSFZMAETDYPESELAVCNFSM6AAAAAA6ZFD67CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJYGUYTQOBYGA. You are receiving this because you were mentioned.Message ID: @.***>
Okay. That is good to know.
Did you come across any errors such as the following:
hydra.errors.InstantiationException: Error in call to target 'src.datamodules.imagenet.ImagenetDataModule': TypeError("__init__() got an unexpected keyword argument 'train_transforms'")
This is what I get when I am running the training example given in the repo
We mainly worked on gpt2 related scripts not working on ViT script. However, I have tried to change such function regarding ViT successfully last year. I remember that I need to change some lines of codes to make it function smoothly.
From: Abhishek Tyagi @.> Sent: Thursday, January 18, 2024 5:59 AM To: HazyResearch/fly @.> Cc: YIFEI ZHANG @.>; Mention @.> Subject: Re: [HazyResearch/fly] Error when running wiki103 gpt2-m and gpt2-l baseline pretraining experiments (Issue #14)
Okay. That is good to know.
Did you come across any errors such as the following: hydra.errors.InstantiationException: Error in call to target 'src.datamodules.imagenet.ImagenetDataModule': TypeError("init() got an unexpected keyword argument 'train_transforms'")
This is what I get when I am running the training example given in the repo
— Reply to this email directly, view it on GitHubhttps://github.com/HazyResearch/fly/issues/14#issuecomment-1898532708, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AUN6SRGISTYTQ3XAUJCEUGDYPETENAVCNFSM6AAAAAA6ZFD67CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJYGUZTENZQHA. You are receiving this because you were mentioned.Message ID: @.***>
Great. Is that code available in the public domain? I see that one your repositories (https://github.com/jiaweizzhao/InRank) has a similar structure. Is it possible for you to share the changes you had to make?
Sorry that the code for vit is not in the public domain because it is very long ago, I changed it. I cannot recover now. However, I can tell you the traces that you can compare the changes made by me with this HazyResearch/fly repo to see necessary changes https://github.com/jiaweizzhao/InRank/blob/master/src/tasks/seq.py. The changes are similar for ViT to make it work.
From: Abhishek Tyagi @.> Sent: Thursday, January 18, 2024 6:13 AM To: HazyResearch/fly @.> Cc: YIFEI ZHANG @.>; Mention @.> Subject: Re: [HazyResearch/fly] Error when running wiki103 gpt2-m and gpt2-l baseline pretraining experiments (Issue #14)
Great. Is that code available in the public domain? I see that one your repositories (https://github.com/jiaweizzhao/InRank) has a similar structure. Is it possible for you to share the changes you had to make?
— Reply to this email directly, view it on GitHubhttps://github.com/HazyResearch/fly/issues/14#issuecomment-1898556718, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AUN6SRAIVWRVIDOQGUX5AITYPEUXZAVCNFSM6AAAAAA6ZFD67CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJYGU2TMNZRHA. You are receiving this because you were mentioned.Message ID: @.***>
No problem. I appreciate your help with my queries!
You're welcome.
From: Abhishek Tyagi @.> Sent: Thursday, January 18, 2024 6:50 AM To: HazyResearch/fly @.> Cc: YIFEI ZHANG @.>; Mention @.> Subject: Re: [HazyResearch/fly] Error when running wiki103 gpt2-m and gpt2-l baseline pretraining experiments (Issue #14)
No problem. I appreciate your help with my queries!
— Reply to this email directly, view it on GitHubhttps://github.com/HazyResearch/fly/issues/14#issuecomment-1898625135, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AUN6SRCXNVGNATMH2KCXOH3YPEZBXAVCNFSM6AAAAAA6ZFD67CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJYGYZDKMJTGU. You are receiving this because you were mentioned.Message ID: @.***>
Hi, When running wiki103 gpt2-m and gpt2-l baseline pretraining experiments,
python run.py experiment=wt103/gpt2m
andpython run.py experiment=wt103/gpt2l
will receive non-converge error. The only solving way we found is to change default precision from 16 to 32. Is any way to keep precision 16 but still converge? I am curious what precision you used to report baseline results in the paper?We use Nvidia A100 80G * 8 machines.
Any help Thanks!