bryantrobbins / baseball

An upcoming web-based tool for sabermetrics.
Apache License 2.0
22 stars 9 forks source link

API: Add validation to SubmitJob API #64

Open bryantrobbins opened 7 years ago

bryantrobbins commented 7 years ago

Before successfully writing to DynamoDB and placing a message on the queue, the SubmitJob API call should validate the parameters of the requested job.

Here is a sample JSON configuration object for a job:

{
  "dataset": "Lahman_Batting",
  "transformations": [
    {
      "type": "columnSelect",
      "columns": [
        "HR",
        "lgID"
      ]
    },
    {
      "type": "rowSelect",
      "column": "yearID",
      "operator": ">=",
      "criteria": "2000"
    },
    {
      "type": "columnDefine",
      "column": "custom",
      "expression": "2*(HR)"
    },
    {
      "type": "rowSum",
      "columns": [
        "playerID",
        "yearID",
        "lgID"
      ]
    }
  ],
  "output": {
    "type": "leaderboard",
    "column": "HR",
    "direction": "desc"
  }
}

Below is a list of required validations.

Dataset:

Output:

ColumnSelect and RowSum Transformation:

RowSelect Transformation:

ColumnDefine Transformation:

bryantrobbins commented 7 years ago

Checking the column definition expressions is the hardest part of this. I'm using the pyparsing module (http://pyparsing.wikispaces.com/) to write a Python class with the necessary logic.

Check out https://github.com/bryantrobbins/baseball/blob/master/shared/btr3baseball/ExpressionValidator.py

bryantrobbins commented 7 years ago

The Configuration Validator (top-level) is Here: https://github.com/bryantrobbins/baseball/blob/master/shared/btr3baseball/ConfigValidator.py

bryantrobbins commented 7 years ago

TODO: Add a list here of possible exceptions thrown by the ConfigValidator for consumption by the UI and Worker