Olivia-fsm / DoGE

Codebase for ICML submission "DOGE: Domain Reweighting with Generalization Estimation"
MIT License
11 stars 3 forks source link

Discrepancy between 6B and 627B version #1

Open punisher1k opened 2 months ago

punisher1k commented 2 months ago

Hi @Olivia-fsm, thank you for sharing your great work.

I am running your provided code on the 6B version of the SlimPajama data set and obtained <= 7. ppl across all domains. I am wondering if this is an unexpected behavior (your reported ppl for C4 and CC are >40).

Besides, could you please share your *bin files for the 627B SlimPajama? The processing time for the dataset is too long is my machine so it would be great if you could also share them with me.

Thank you in advance.

image
sowmaster commented 1 month ago

Hi, were you able to run this code as provided or did you have to make few changes/debugging? Also did you run it on a single or multi GPUs? Thanks!

sowmaster commented 1 month ago

BTW I think your plots show the log perplexities and not the actual perplexities which would be why you have smaller values.

punisher1k commented 1 month ago

Yes, I reproduced the exp using the exact config but my obtained Github domain weight is often large (>.5). Did you observe a similar parttern?

sowmaster commented 1 month ago

I had to make few debugging on the data processings but finally were able to run code. I am just using the provided domain weights at the moment so didn't get to make that observation yet. Were you able to run it on multi GPU? Thanks!

punisher1k commented 1 month ago

Yes, just run the code as it is and accelerate will take care of that