More consistent preprocessing

Yale-LILY / NLP4Code

Repository for the NLP4Code project at the LILY lab.

Apache License 2.0

5 stars 1 forks source link

More consistent preprocessing #26

Open niansong1996 opened 1 year ago

niansong1996 commented 1 year ago

Currently we have the preprocessing scripts in preprocessing/ but to be able to fully reproduce the datasets we use, we need to make the process more consistent.

Ideally, we would want it to be a huggingface dataset that handles everything from download, cache, to preprocessing.