allenai / pawls

Software that makes labeling PDFs easy.
https://pawls.apps.allenai.org
Apache License 2.0
380 stars 74 forks source link

Add CLI dataset management #145

Open jbarrow opened 2 years ago

jbarrow commented 2 years ago

I was thinking of a set of commands, to create a dataset:

pawls dataset create [DATASET NAME] [INITIAL PDFS]

Add pdfs to the dataset:

pawls dataset add [PDFS]

And offer per-dataset configuration for the label-set. Some discussion happening in #144, with the proposal that datasets are top-level folders (w/in skiff_files). I think that's the simplest, and would let you drop an overriding configuration file into each dataset folder.

One last concern is where the datasets should reside. Would a user need to provide the relative path to the skiff_files fiolder for each of the sub-commands, to make sure they're copied into the right place?