ad-freiburg / pfaedle

Precise map-matching for public transit feeds. Generates high-quality GTFS shapes from OSM data.
GNU General Public License v3.0
208 stars 29 forks source link

Doesn't like Non-negative Integers greater than 65535 #36

Closed stupidpupil closed 2 years ago

stupidpupil commented 2 years ago

If an input GTFS has a field with a Non-negative Integer greater than 65535, then pfaedle will stop with a parsing error like:

data-raw/filed40c09ec85/frequencies.txt:11566: in field 'headway_secs', expected integer in range [0,65535]

(In this case, the headway_secs was 73800, or 20hrs30min.)

This is valid according to the GTFS reference which places no upper limit on Non-negative Integer fields:

Non-negative Integer - A integer greater than or equal to 0.

But, in practice, it seems unreasonable to expect tools to deal with arbitrarily large numbers, so I'm not sure I can really fault pfaedle here! (I will raise an issue with the tool that generated this GTFS as well.)

For completeness, there are two places where this field-type is used where I think you might, sort of, expect to find very large values:

stupidpupil commented 2 years ago

Ah, it looks as though the tool that I used to generate this GTFS has just added a flag to limit _headwaysecs, so I'll try using that: https://github.com/patrickbr/gtfstidy/commit/d22ca560952b1883725123dbe3e0c4e6256cd4f1

(And might suggest reducing the default to 65535 seconds, from 24 hours.)

derhuerst commented 2 years ago

If an input GTFS has a field with a Non-negative Integer greater than 65535, then pfaedle will stop with a parsing error […] (In this case, the _headwaysecs was 73800, or 20hrs30min.) […] But, in practice, it seems unreasonable to expect tools to deal with arbitrarily large numbers, so I'm not sure I can really fault pfaedle here! (I will raise an issue with the tool that generated this GTFS as well.)

Fair enough, but I'd argue that pfaedle's choice of numeric types shouldn't limit what I'm able to express in GTFS, as long as I follow the GTFS spec.

patrickbr commented 2 years ago

I mostly agree, but of course it's impossible to exactly implement the GTFS spec (i.e. unlimited integer ranges) on a machine with finite memory. In the underlying library (cppgtfs) we usually chose variable types in such a way that they allow for all realistic values. We decided to use a type for the headway seconds that allows for values corresponding to at least 12 hours (unsigned 16 bit integer). Remember that the headway seconds are given relative to a specific date, so if you have a trip that is repeated every N > 12 hours, it is probably not a good idea to model this using a frequency beginning on a specific date. There are two situations I think in which this would occur:

1) There is a single trip per day, and its schedule is modelled with a frequency that only has a single occurance per day. In this case, it would be better to bake the time into stop_times.txt explicitly.

2) You actually have a trip that is repeated every 20 hours, over a number of M days. If you are using frequencies.txt to model this, you are assuming a single day spanning M * 24 hours, which isn't very natural. It would be better to give the explicit stop_times.txt for each trip.

Nevertheless, I agree that the standard allows for arbitrary values, and there might by a useful scenario missing from my list. We increased the max number to 2^32 in https://github.com/ad-freiburg/cppgtfs/commit/aa2714065e1ee8a4773f25aec598d781985086b4, which is now used in https://github.com/ad-freiburg/pfaedle/commit/b4b08baecabe5b48a631d6167cec9c457c497c81