GAIR-NLP / ProX

Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"
https://gair-nlp.github.io/ProX/
Apache License 2.0
194 stars 15 forks source link

Will the code of this framework be open-sourced? #2

Closed yucc-leon closed 1 month ago

yucc-leon commented 2 months ago

Great and insightful work! I noticed that this repo gave the dataset generated by ProX instead of the pretraining corpus and code of proX's framework. Would you release these soon?

koalazf99 commented 2 months ago

Thank you! Yes, it is in our release plan and we are working on cleaning & refactoring code to make the framework more scalable. It will be shifted after several sanity check.

yucc-leon commented 2 months ago

Thank you! Yes, it is in our release plan and we are working on cleaning & refactoring code to make the framework more scalable. It will be shifted after several sanity check.

Looking forward to your next release!

koalazf99 commented 1 month ago

Hi @yucc-leon, thank you for your patience. We've released the refining framework, together with the refining models on huggingface. (see here)

koalazf99 commented 1 month ago

supported in #6.