Closed tdelmas closed 11 months ago
Name | Link |
---|---|
Latest commit | eb20162eb6377a2b77bf9dccfdbecdc52b6dc87b |
Latest deploy log | https://app.netlify.com/sites/gbfs-validator/deploys/64f835537bdeb80007c92200 |
Deploy Preview | https://deploy-preview-106--gbfs-validator.netlify.app/ |
Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
Happy to see the wording nonSchema
in the code 👍
Due to limited resources, we will only be able to take a closer look in a few weeks.
Please let us know if we are blocking you in any way.
Thank you for your patience 😌
Hi @PierrickP and @tdelmas, Thank you very much for this very valuable contribution 🙇
I wonder if we could make the interface more consistent for users when there are both errors and warnings. What do you think about these UI suggestions? (the "After" screenshots are collages, not implemented)
danger
and warning
alert components for consistency and visibilitycc @josee-sabourin @isabelle-dr
State | Before | After (suggestion) |
---|---|---|
summary collapsed | ||
summary expanded |
For what it's worth, I am generally in favor of having the errors (schema and non-schema) in one place, because from the user perspective, it's a spec violation, regardless of if generated by the JSON Schema or the custom code.
Great extension. I just found this PR searching for a check on duplicate_vehicle_type_id
, which I encounter currently with this feed, and which could be a nice addition to the nonSchema tests.
@richfab @isabelle-dr i regrouped errors together and update the design
@hbruch I've added a new rule (duplicate_vehicle_type_id
) to check duplicated vehicle_type_id
on vehicle_types.json
Thank you all for your detailed review.
I've pushed a commit that apply most suggestions of your reviews, and marked them as "resolved" for readability.
Comments not marked as resolved are not fixed, but as they require more tests, I strongly feel they are more suited in a follow-up pull request.
Thank you @tdelmas for taking most suggestions into consideration. This is amazing!
I will review your changes as soon as possible.
@testower thank you for your review.
I'll try to answer your concerns.
None of the new rules are implemented using partials / schema patching, although after skimming through them, I think a lot of them could well have been (?) - is this a strategic decision?
Yes. Logic errors are much more difficult to write using schemas, and even where they are, it leads to some problems and bugs. For example, if two rules needs to patch the same part of a schema, the results will be unpredictable. Or if the original schema changes. Also, most of them are cross-files, which least to reading a file A, then creating the schema for B and vice versa.
What about the rules that are already implemented that way, are they going to be left like this or ported to the imperative style?
I feel like some should, in the future.
The benefit of using the "old" convention is that the output automatically conforms to the regular schema validation error outputs, whereas in the imperative style, they have to be constructed manually. Especially since this project has no type-checking, this is prone to errors.
I agree, this could be improved in a future PR.
There is a breaking change in the response format of the validator api - which prompts me to raise the question of standardizing this format for portability purposes. If validators respond the same way, frontends and validators can be used interchangeably, and validation reports can be consumed later by any "conformant" UI.
That's a good goal.
Last, the most serious issue for me, this PR introduces a number of warnings based on assumptions about a number of things that I don't think a generic / canonical GBFS validator should be concerned with. Business rules that depend on operational context should be left out of the validator.
Warning are just warnings: here to catch the attention, because it caught something that may be an error. I insist on the may.
I strongly feel they are important because they help to catch subtle mistakes, such as prices in cents instead of base units, or meters vs kilometers, because some feeds had those problems.
If this validator was a library, one could imagine a feature that allowed library users to provide their own additional rulesets specific to their operational context - but as it stands now, this is just speculative.
A separation could be done, it's an interesting idea for a future PR
Let's take a step back to look at first principles. The GBFS validator is, according to the README
A General Bikeshare Feed Specification dataset validator
furthermore
The Canonical GBFS Validator is a tool to check the conformity of a GBFS feed against the official specification.
The word "canonical" is instructive if taken to mean: "according to the canon", the canon being the "the standard, rule or primary source that is accepted as authoritative"[1]. The "canon" in this context is obviously the GBFS specification.
I will interpret this to mean that the validator is intended to be isomorphic[2] to the specification it validates against. Any departure from that principle should be seriously discussed.
Let me elaborate on why I think it's a bad idea to introduce validation rules that break this isomorphism. My argument is that any validation rule that doesn't have a corollary in the specification, is a business rule only relevant for a specific operational context.
So while it may be useful to get a warning if a car is reported to have room for more than 5 passengers, or a moped has a max range of more than 100km, or a car rental costs more than 50 USD/EUR for an hour (?) - just to name a few examples - these are useful in a specific operational context (perhaps yours?), but it's hard to argue that they are universal truisms about the domain over which the GBFS specification is meant to govern. If they were, they would have been encoded in the specification already.
Now I'm not saying that such rules don't have merit, quite the contrary. But I'm saying that they belong in the specific systems that operate in the specific contexts that give rise to their usefulness. Anywhere else, they create clutter and noise.
For the sake of argument, I shall entertain the idea that these rules should be in the validator. Now, what process or rationale should determine what these rules and their threshold values should be? Is it consensus-based, or democratic? The greatest common denominator of all producers? Who shall bear the burden of maintaining these over time?
As an exercise, imagine that ever more currencies are added to the "typical cost" structure. How is it possible to keep this structure updated and relevant as prices change and currencies inflate and deflate.
I believe that the best candidates for taking that responsibility are those who operate in the context where those rules, thresholds and values are useful and relevant. Offloading that to the canonical validator is a recipe for future decay and ultimately complete irrelevance of the validator as a general-purpose tool to validate against the specification.
If I were to pretend to be positive about accepting these new rules, I would still expect the following preconditions to be met:
Even with these preconditions met, I'm not enthusiastic about them, since they still introduce a maintenance burden. I.e. someone must take responsibility for keeping them up-to-date and relevant, and someone must make sure that the total set of such non-specification rules is kept at a reasonable level. I'm particularly skeptical about the "typical cost" check. For one because it equates USD with EUR which could quickly change - but more importantly because it opens the possibility to add checks for any number of currencies, which, needless to say, would very quickly grow to unmanageable scales. Barring additional currencies from being added to avoid this problem, just amounts to discrimination, so I just don't see a way to accept it in the first place.
Related to the point I made about the output from the validator being standardized, I'm not sure how this fits in. The taxonomy is very unclear and ambiguous about what constitutes a validation error based on the specification and what is a result of an opinionated "sanity check".
All in all, I think this marks a very significant shift in the goal and purpose of the validator that we should not take on lightly. I ask the community to think very carefully about accepting these kinds of changes.
[1] https://en.wikipedia.org/wiki/Canonical [2] Isomorphic in the set-theoretic sense: The set of validation rules should be identical to the set of rules in the specification.
So while it may be useful to get a warning if a car is reported to have room for more than 5 passengers, or a moped has a max range of more than 100km, or a car rental costs more than 50 USD/EUR for an hour (?) - just to name a few examples - these are useful in a specific operational context (perhaps yours?)
As I said, those rules are not here to detect "high" or "low" prices, but cases when the producer made a conversion error (between meters vs kilometers, $ vs cents, etc.). But yes, it's not a possible to detect that perfectly.
@testower Is there a list of rules we can remove from this PR, so it could be acceptable for you?
@tdelmas The issue is that there is a severe break in principle between a format-driven validation, and inclusion of subjective detection of possible user errors. It's not about what is acceptable for @testower personally, it's about having a whitelabel validator that works equally well for any user. That is the main point here.
Validation rules (the warnings), like these, do have a role to play but they would work better as a separate validator, perhaps as a template that each user can configure for local use. That way, each user can define their own limitations in weight, size, doors, and prices according to their own needs - but this still has nothing to do with the GBFS standard.
Our intention is not to be troublesome but to ensure that certain principles are maintained.
@testower Is there a list of rules we can remove from this PR, so it could be acceptable for you?
Yes, all the warnings.
Hello everyone,
Thank you all for sharing your views on the validator with such enthusiasm and quality arguments. It’s very exciting to see that this tool is getting the attention it deserves from the community.
Thank you @testower for reminding the purpose of the canonical GBFS validator, which is to validate the conformity of a GBFS feed with the specification. MobilityData will make sure to keep this at the forefront of our decisions.
To ensure isomorphism between the specification and the validator, MobilityData recommends continuing to include in the validator only the rules that are in the specification.
The data sanity checks proposed by @tdelmas is a very good idea and could have a place in a separate validator configurable by users, as suggested by @JohanEntur.
Here is MobilityData’s proposal:
Rule | Proposal | GBFS spec extract |
---|---|---|
last_updated_future |
Keep in this PR | "Indicates the last time data in the feed was updated." |
additional_properties |
Keep in this PR | "Field names of extensions SHOULD be prefixed with an underscore ( _ ) character” |
missing_translation |
Keep in this PR | "Translations MUST be provided for each supported language for all translatable fields of type Array |
unknown_language |
Keep in this PR | "Must match one of the values specified by the languages field in system_information.json.” |
missing_default_pricing_plan_id |
Keep in this PR | "REQUIRED if system_pricing_plans.json is defined.” |
missing_station_information |
Keep in this PR | "Any station that is represented in station_status.json MUST have a corresponding entry in station_information.json.” |
missing_station_status |
Keep in this PR | "Any station that is represented in station_information.json MUST have a corresponding entry in station_status.json.” |
duplicate_station_id |
Keep in this PR | "Array that contains one object per station as defined below.” |
default_reserve_time |
Keep in this PR | "REQUIRED if reservation_price_per_min or reservation_price_flat_rate are defined.” |
unclosed_polygon |
Keep in this PR | "The first and last positions are equivalent, and they MUST contain identical values (https://datatracker.ietf.org/doc/html/rfc7946#section-3.1.6)” |
duplicate_bike_id |
Keep in this PR | "Array that contains one object per vehicle” |
duplicate_vehicle_id |
Keep in this PR | "Array that contains one object per vehicle” |
invalid_vehicle_type_id |
Keep in this PR | "The vehicle_type_id of this vehicle, as described in vehicle_types.json.” |
duplicate_vehicle_type_id |
Keep in this PR | "Array that contains one object per vehicle type” |
unknown_plan_id , missing_system_pricing_plans and invalid_pricing_plan_id |
Combine the 3 alerts as they serve the same purpose | "A plan_id, as defined in system_pricing_plans.json” |
Rule | Proposal | GBFS spec extract |
---|---|---|
last_updated_outdated |
Return a warning instead of an error | "in no case should it be more than 5 minutes out-of-date” |
num_vehicles_available_incorrect |
Return a warning instead of an error | "The total number of vehicles from each of these objects SHOULD add up to match the value specified in the num_vehicles_available field.” |
num_docks_available_incorrect |
Return a warning instead of an error | "The total number of docks from each of these objects SHOULD add up to match the value specified in the num_docks_available field.” |
duplicate_translation
, duplicate_vehicle_accessory
, multiple_door_counts
, num_docks_available_high
, num_vehicles_available_high
, station_capacity_too_low
, unexpected_rider_capacity
, unexpected_cargo_volume_capacity
, unexpected_cargo_load_capacity
, unexpected_propulsion_type
, unexpectedly_low_range_meters
, unexpectedly_high_range_meters
, unexpected_vehicle_accessory
, unexpectedly_low_wheel_count
, unexpectedly_high_wheel_count
, low_cost
, high_cost
.
We suggest creating a separate issue to keep a list of potential errors/warnings that would be outside of the specification where others can add so that we have a good basis when/if we start to add data sanity checks to the validator.Please let us know what you all think. Thank you!
@richfab I think that's a very good solution and it answers my concerns 🥇 if these changes are made I think we can move on to the more technical aspects of the review
Hi @tdelmas, please let me know if you have any questions or concerns about the proposal. Thank you!
Thank you for the feedbacks and your ping.
Thank you @tdelmas and @testower. Since consensus was reached, I am planning to work on removing the code related to data sanity rules from this PR next week. The code will not be lost as it will remain in the commits.
The discussion about data sanity rules was moved to https://github.com/MobilityData/gbfs-validator/issues/138.
Thank you!
if these changes are made I think we can move on to the more technical aspects of the review
@testower @tdelmas A technical review would be very welcome in https://github.com/MobilityData/gbfs-validator/pull/141, if you have the bandwidth.
Thank you in advance for making the GBFS validator even more useful to the community 🙌
Add non-jsonSchema validations
Checked errors and warnings are listed in README.md
Display a condensed summary of errors and warnings grouped by types and paths
Fix #92
Fix #93 by displaying a warning
Exemple: https://deploy-preview-106--gbfs-validator.netlify.app/validator?url=https%3A%2F%2Fdubai.publicbikesystem.net%2Fcustomer%2Fgbfs%2Fv2%2Fgbfs.json