Closed botanize closed 1 year ago
I'd like to move this issue forward so it can be resolved for V1:
vehicle_locations
stop_sequence
, renamed to scheduled_stop_sequence
should not be required. New trip_stop_sequence
should also not be requiredservice_date
should not be requiredlocation_ping_id
should be required, or do we also create similar alternate unique keys for fare_transactions.transaction_id
and passenger_events.passenger_event_id
(shouldn't these be parallel)fare_transactions
fare_capped
should not be required (or is the rationale that boolean value should be required. I'd prefer a default assumption if the value is not specified, in this case FALSE)passenger_events
trip_id_performed
should not be required (similar to vehicle_locations
), but if trips_performed
table is populated it would be wise to link passenger_events
via the trip_id_performed
!I don't see anything else on the Summary Tables or Support Tables.
Unless there is disagreement, I will implement per @jlstpaul 's comment on April 12th.
Describe the feature you want and how it meets your needs or solves a problem
As a producer of data, I want to easily provide my data in TIDES format so that I can take advantage of tools designed to work with the spec. I want to use the spec to tell vendors what the minimum data requirements are, and how they can validate their output. I am willing to forgo immediate utility of the data in favor of more tools that help me understand and enhance my validated data.
Describe the solution you'd like
I want to relax as many required constraints as possible, throughout the spec.
vehicle_locations
trip_id_performed
, #71stop_sequence
date
location_ping_id
, which could be replaced byvehicle_id
,timestamp
and conditionally,device_id
.fare_transactions
fare_capped
passenger_events
trip_id_performed
stop_sequence
Some
enum
fields might be better asstring
s to support agency specific values, or enum options should be expanded.route_type
to astring
, #85Describe alternatives you've considered
TODO
Additional context and sample data
Here's my understanding of the discussion from the 2022-11-02 contributor meeting:
There are two key issues:
The proposed resolution is to eliminate the required constraint from as many fields as possible, making the spec more flexible, but reducing the guarantees for what can be done directly with fully valid data.
datapackage.json
file that should accompany data files allows for arbitrary metadata that can be used by producers to indicate the status of the data (raw, cleaned, whatever)datapackage.json
file should not be used to describe the properties of the data, and if it is, consuming tools should not trust it. For example, do not claim that there are no duplicate keys, or that there are no null values. It is the responsibility of the consuming tool to verify the data meet their needs and raise the appropriate errors or warnings as needed.