Closed derhuerst closed 6 years ago
Turns out this is an encoding issue. The stops.txt
file contains three leading bytes which do not belong to the CSV:
xxd -l 32 stops.txt
00000000: efbb bf22 7374 6f70 5f69 6422 2c22 7374 ..."stop_id","st
00000010: 6f70 5f63 6f64 6522 2c22 7374 6f70 5f6e op_code","stop_n
Removing these bytes fixes it temporarily.
dd if=stops.broken.txt of=stops.txt bs=3 skip=1
Just stumbled upon this. The three bytes are the UTF-8 Byte Order Mark and are acceptable in GTFS files as per https://developers.google.com/transit/gtfs/reference/#file_requirements
thanks, i adapted my build scripts.
The October dataset'sstop_times.txt
references a lot of stations that do not exist instops.txt
.Parts of the build log of vbb-stations, taken every 100k lines:Find the full log here: errlog.gz