allenai / amti

A Mechanical Turk Interface (amti) 🤖
Apache License 2.0
55 stars 18 forks source link

Add extract tabular #5

Closed nalourie-ai2 closed 6 years ago

nalourie-ai2 commented 6 years ago

This pull request adds a command to extract all the batch data in a tabular format.

The new command is:

$ amti extract tabular --help
Usage: amti extract tabular [OPTIONS] BATCH_DIR OUTPUT_PATH

  Extract data from BATCH_DIR to OUTPUT_PATH in a tabular format.

  Given a directory (BATCH_DIR) that represents a batch of HITs that have
  been reviewed and saved, extract the data to OUTPUT_PATH in a tabular
  format. Every row of the table is an assignment, where each form field has
  a column and also there are additional columns for assignment metadata. By
  default, the table will be saved as JSON Lines, but other formats may be
  specified with the --format option.

Options:
  -f, --format [csv|json|jsonl]  The desired output file format.
  -h, --help                     Show this message and exit.

The command should be helpful in pulling out all the batch data into a form that's easier to work with than the directory tree of XML files we used previously.

Additionally, this pull request changes the command line interface by taking the old extract_xml command and grouping it with the new tabular command as xml under the extract command group.