📄🚀 – Relax constraints as much as possible

botanize commented 2 years ago

Describe the feature you want and how it meets your needs or solves a problem

As a producer of data, I want to easily provide my data in TIDES format so that I can take advantage of tools designed to work with the spec. I want to use the spec to tell vendors what the minimum data requirements are, and how they can validate their output. I am willing to forgo immediate utility of the data in favor of more tools that help me understand and enhance my validated data.

Describe the solution you'd like

I want to relax as many required constraints as possible, throughout the spec.

vehicle_locations
- [x] trip_id_performed, #71
- [ ] stop_sequence
- [ ] date
- [ ] maybe even location_ping_id, which could be replaced by vehicle_id, timestamp and conditionally, device_id.
fare_transactions
- [ ] fare_capped
passenger_events
- [ ] trip_id_performed
- [ ] stop_sequence
TODO: summary tables, supporting tables

Some enum fields might be better as strings to support agency specific values, or enum options should be expanded.

[x] Convert route_type to a string, #85

Describe alternatives you've considered

TODO

Additional context and sample data

Here's my understanding of the discussion from the 2022-11-02 contributor meeting:

There are two key issues:

how to define needs related to TIDES in contracts
how to guarantee functionality, the ability to do useful things with the TIDES data

The proposed resolution is to eliminate the required constraint from as many fields as possible, making the spec more flexible, but reducing the guarantees for what can be done directly with fully valid data.

By converting as many fields as possible to optional we lose some functionality guarantees, but don't need to worry about specifying multiple tiers of spec adherence (Gold, Silver, Bronze!), or predicting which fields would be required for which purposes.
The remaining required fields really do represent the minimum requirements, and so the spec can be used as-is for contracts, or agencies can upgrade individual fields to required. Either way, the resulting data will validate, and a vendor can verify they've met the contract terms by passing validation.
Data that pass the bare minimum spec may not be very useful on their own. Additional tools, not specified in the spec, and with their own data requirements may be needed to polish these rough datasets into analytic jewels.
The widespread use of a minimal spec will generate demand for such polishing tools and there are likely to be multiple options with different features and data requirements.
The use of a minimal spec also facilitates the development of QA/QC tools based on the spec, instead of stipulating that data in the spec already pass quality control checks.
The datapackage.json file that should accompany data files allows for arbitrary metadata that can be used by producers to indicate the status of the data (raw, cleaned, whatever)
The datapackage.json file should not be used to describe the properties of the data, and if it is, consuming tools should not trust it. For example, do not claim that there are no duplicate keys, or that there are no null values. It is the responsibility of the consuming tool to verify the data meet their needs and raise the appropriate errors or warnings as needed.

jlstpaul commented 1 year ago

I'd like to move this issue forward so it can be resolved for V1:

vehicle_locations

Per Issue #112, agree that stop_sequence, renamed to scheduled_stop_sequence should not be required. New trip_stop_sequence should also not be required
Agree that service_date should not be required
I think location_ping_id should be required, or do we also create similar alternate unique keys for fare_transactions.transaction_id and passenger_events.passenger_event_id (shouldn't these be parallel)

fare_transactions

Agree fare_capped should not be required (or is the rationale that boolean value should be required. I'd prefer a default assumption if the value is not specified, in this case FALSE)

passenger_events

Agree trip_id_performed should not be required (similar to vehicle_locations), but if trips_performed table is populated it would be wise to link passenger_events via the trip_id_performed!

I don't see anything else on the Summary Tables or Support Tables.

e-lo commented 1 year ago

Unless there is disagreement, I will implement per @jlstpaul 's comment on April 12th.

TIDES-transit / TIDES

📄🚀 – Relax constraints as much as possible #88