ad-freiburg / pfaedle

Precise map-matching for public transit feeds. Generates high-quality GTFS shapes from OSM data.
GNU General Public License v3.0
208 stars 29 forks source link

Stop_ids seem to be treated as numbers #51

Closed extinctPencil closed 10 months ago

extinctPencil commented 10 months ago

Got error when stop_id was 49000849800 stop_times.txt:36: in field 'stop_id', no stop with id '490008498E2' defined in stops.txt, cannot reference here.

(this was genuine clean CSV this is not an artefact of corruption by Excel or the like)

I resolved the issue by processing stops.txt and stop_times.txt and 'quoting' the values Would prefer not to have to pre-process files adding here in case others encounter...

patrickbr commented 10 months ago

Hm, I just tried to reproduce this without success, a stop ID 49000849800 was parsed fine. The stop_id is never treated or stored as an integer anywhere in the code, it is always a string as specified by the standard (see e.g. https://github.com/ad-freiburg/cppgtfs/blob/master/src/ad/cppgtfs/Parser.tpp#L1437). I also never personally encountered this problem. Could you share the input feed on which the error occured?

Quoting the values should not make any difference. The quotes are removed by the CSV parser before the value is passed - as a raw string - to the GTFS parser (see https://github.com/ad-freiburg/cppgtfs/blob/master/src/ad/util/CsvParser.cpp)

extinctPencil commented 10 months ago

Thanks yes I agree its odd , I think all the Stop_ids are high order integers
I wrote code to cross check and remove any Ids in the stop_times.txt that weren't in the stops.txt but that didn't find any /help quoting immediately did ...

there was one genuine error stops.txt.parent_id not found in stops.txtx probably my bad , so I just null the parent_id for now

I will attach the GTFS file (a subset of London Buses) the OSM is the whole of central London too big to supply Perhaps you can get your own from geofabrik if not let me know and I will find a way I hope you can reproduce it as my C++ debugging skills are a bit rusty , apologies if I am having a senior moment

LDN.bus.gtfs.zip

patrickbr commented 10 months ago

Thanks! I manually nulled the handful of incorrect parent stations in your file LDN.bus.gtfs.zip. Without them, pfaedle parses the feed just fine.

Checking the feed, I am now also a bit confused. In your initial post, you wrote that the error occured when the stop id was 49000849800. But no such stop id is present in the GTFS feed you uploaded.

extinctPencil commented 10 months ago

Thanks again I will go back round the loop and double check, and hopefully close with apologies.. from the original post '490008498E2' was being reported by pfaedle on line 36 of stop_times.txt .... I think my confusion may have been when looking at this and searching in EXCEL , this has been interpreted as an Exponential and when the column width expanded rendered incorrectly rendered as 49000849800 . ... that may explain my confusion