MobilityData / gbfs-validator

The canonical GBFS validator. Maintained by the GBFS community, facilitated by MobilityData.
https://gbfs-validator.mobilitydata.org/
Apache License 2.0
18 stars 12 forks source link

Object dependency relationships validation #153

Open richfab opened 9 months ago

richfab commented 9 months ago

What is the issue and why is it an issue?

The validator cannot currently validate object dependency relationships. This can lead to errors in the feeds that are difficult to identify and fix by the producer.

In https://github.com/MobilityData/gbfs-validator/pull/106, @tdelmas did a remarquable job of validating these dependencies using imperative rules (in JavaScript). Which works very well! However, this validation pattern may be harder to maintain over time in a community based tool which already uses JSON schemas and schema patching patterns.

image

Please describe some potential solutions you have considered (even if they aren’t related to GBFS).

As pointed by @testower here, there is a benefit of using schemas and patched schemas only because the validation itself is done by ajv and all the results are an output of ajv.

Object dependency relationships can be validated by applying these 3 rules:

  1. Primary key validation Example: Validate that each station_id MUST be unique in station_information.json. Potential solution: Use ajv #uniqueItemProperties (IDs must be unique in Arrays of objects).

  2. Foreign key validation Example: Validate that each station_id in station_status.json MUST exist in station_information.json. Potential solution: Use JSON Schema patching #Enumerated values (the Foreign Key ID must exist among the Primary Key IDs).

  3. Cardinality validation Example: Validate that each station in station_information.json MUST have exactly one entry in station_status.json. Potential solution: Use JSON Schema patching #Length (the length of the Arrays is the same), combined with the 2 rules above.

Primary key fields ``` - gbfs_versions.json#versions.version - vehicle_types.json#vehicle_types.vehicle_type_id - station_information.json#stations.station_id - vehicle_status.json#vehicles.vehicle_id - system_regions.json#regions.region_id - system_pricing_plans.json#plans.plan_id - system_alerts.json#alerts.alert_id - system_information.json#languages ```
Foreign key fields ``` - manifest.json#datasets.system_id - vehicle_types.json#vehicle_types.default_pricing_plan_id - vehicle_types.json#vehicle_types.pricing_plan_ids - station_information.json#stations.region_id - station_information.json#stations.vehicle_types_capacity.vehicle_type_ids - station_information.json#stations.vehicle_docks_capacity.vehicle_type_ids - station_status.json#stations.station_id - station_status.json#stations.vehicle_types_available.vehicle_type_id - station_status.json#stations.vehicle_docks_available.vehicle_type_ids - vehicle_status.json#vehicles.vehicle_type_id - vehicle_status.json#vehicles.station_id - vehicle_status.json#vehicles.home_station_id - vehicle_status.json#vehicles.pricing_plan_id - system_alerts.json#alerts.station_ids - system_alerts.json#alerts.region_ids - geofencing_zones.json#geofencing_zones.features.properties.rules.vehicle_type_ids - every field of type Array ```
Fields with cardinality constraint ``` - station_status.json#stations.station_id - every field of type Array ```

Implementation

  1. Let's use this issue to align on the validation pattern.
  2. Anyone, feel free to assign yourself if you are interested in doing the implementation.
  3. If no one from the community has the bandwidth, it's ok! MobilityData will include this feature in its engineering prioritization.

Thank you all for your many great contributions already!

tdelmas commented 9 months ago

@tdelmas did a remarquable job of validating these dependencies using imperative rules (in JavaScript).

Thank you!

However, this validation pattern may harder to maintain over time in a community based tool which already uses JSON schemas and schema patching patterns.

I agree that a validation 100% based on schemas would be better if possible. There are a lot of benefits to that!

I must point out that using only "main" schemas and patching them to add the other rules is complex: if two rules need to patch the same path, it may be very hard to do it without one overriding the other, and if it works, any small edition in one of the rule could break the other, so exhaustive automatic tests must be in place if done.

Maybe a solution could be to use "main" schemas and generating some others instead of patching?

(I don't have the bandwidth to work on it, but I wanted to highlight that issue, which made me use JS validation instead.)

testower commented 9 months ago

I must chime in: thanks for the incredible effort you've put down @tdelmas!

I must point out that using only "main" schemas and patching them to add the other rules is complex: if two rules need to patch the same path, it may be very hard to do it without one overriding the other, and if it works, any small edition in one of the rule could break the other, so exhaustive automatic tests must be in place if done.

Yes I agree, the added complexity is a downside, but

Maybe a solution could be to use "main" schemas and generating some others instead of patching?

This is a great idea actually! Either via a builder or a template. The benefit of this is that we can give each added rule a unique schema id and description, so it will be easier to identify in the result as well. I wouldn't mind exploring this option a little further actually.

testower commented 9 months ago

Probably also relevant for #154