distributeaid / toolbox

DistributeAid Toolbox
http://distributeaid.org
GNU Affero General Public License v3.0
3 stars 1 forks source link

🚧 Verify shipment data collected in Google Spreadsheets using JSON schema #49

Closed coderbyheart closed 4 years ago

coderbyheart commented 4 years ago

This issue tracks work around this idea.

It's likely that there will be a need to validate the contents of a spreadsheet, by taking the data in and turning it into the domain model (create a new shipment). I would prefer if we could document the rules for how the data of a shipment spreadsheet looks like is documented using JSON schema. This way we have a data-oriented, language independent, definition of the data model for a shipment spreadsheet, which can be used to verify the data.

Input needed

User story

Feature: Validate a Shipment Data Spreadsheet

    As a user I have provided all necessary shipment data
    in the Google Spreadsheet and now I want to verify
    that it is in the right format

    Note: This feature itself is not directly user-facing,
    there will be some UI around it, 
    therefore this feature describes a lower-level API.

  Scenario: correct shipment data

    Given I have filled out the spreadsheet at "<insert correct Google Spreadsheet URL here>"
    And I verify the contents against the JSON schema "http://schemas.distributeaid.org/latest#shipmentIntake"
    Then there should be no JSON schema validation errors

  Scenario: incorrect shipment data

    Given I have filled out the spreadsheet at "<insert incorrect Google Spreadsheet URL here>"
    And I verify the contents against the JSON schema "http://schemas.distributeaid.org/latest#shipmentIntake"
    Then there should be JSON schema validation errors

Implementation

This does not necessarily need to be implement in toolbox. It could be implemented as a microservice which is capable of taking a Google Drive URL, extracting the data from Google Drive, and matching it against a JSON schema URL.

Input:

Output:

Parsing a spreadsheet

In order to allow the spreadsheet be flexible in formatting and allow for re-arranging of fields, I'd follow the sample from German Tax Forms:

image

Note the green numbers (Field Code) in front of the white boxes where data is supposed to be entered. These numbers are used to identify values and while allowing for different representations (on paper, electronically).

So, in our forms we can establish the rule that all fields need to have a Field Code in the column before the data:

Data required Description FC Please input your data here
Pick up location Name of location (e.g. Freddy's Clothing Co) 101 Paroh Ltd
Contact information Name 201 Alex Doe
Phone Number 202 +123456890
Email 203 alex@example.com

As long as there is a column labeled FC (Field Code) and it contains a number we can locate that value regardless of the spreadsheet layout.

For inputs that produce multiple entries (e.g. Cargo details) it would work the same. Every input field gets a number, and the parser will generate a list of items if it detects entries with the same code.

FCPackage number FCPackage type FCPackage dimensions FCPackage weight FCContents FCContents FCStackable FCDangerous goods
901 902 1 903 pallet x 17 904 120x200x100 905 232 tents 906 117 907 no 908 no
coderbyheart commented 4 years ago

This is no longer needed.

The forms mentioned above (example) are no longer used, instead a Google Form ("DA Quote Request") is used to collect the necessary information. Work to support this form has been started in #84.