edanalytics / lightbeam

CLI tool for validating and transmitting payloads from JSONL files into an Ed-Fi API.
Apache License 2.0
10 stars 1 forks source link

add flag to optionally generate structured output #8

Closed tomreitz closed 1 year ago

tomreitz commented 1 year ago

To use lightbeam in an automated data pipeline it would be useful to have it generate structured output noting the number of records sent and skipped by resource and status code, the run timestamp, and possibly other information. Open questions include:

  1. what should the flag be? perhaps --results-file output.json
  2. what should the output format be? (YAML, JSON, something else?)
  3. besides the items noted above, what else should be included in the results? perhaps the Ed-Fi API base_url? anything else?

Tagging @jayckaiser for thoughts.

tomreitz commented 1 year ago

Based on our conversation today, I'm proposing that

We also discussed the exit codes lightbeam should return in different circumstances, to trigger the AirFlow BashOperator task to raise an AirflowSkipException or AirflowException (per the Airflow docs). Is it correct that

  1. if total_records_skipped == total_records_processed, lightbeam should return exit code 99? (so the Airflow task skips)
  2. if total_records_failed>0 lightbeam should return exit code 1? (so the Airflow task fails)

One other issue has to do with reporting line numbers: lightbeam can process either a single JSONL file (source_dir/students.jsonl) or an entire directory of JSONL files (source_dir/students/*.jsonl). In the latter case, how would line numbers be determined/interpreted?

Finally, one tiny question, should line numbers be 0-based or 1-based?

jayckaiser commented 1 year ago

Some thoughts on this one as well:

tomreitz commented 1 year ago

Closing this issue as it was implemented in this PR.