OneBusAway / onebusaway-csv-entities

A Java library for reading and writing Java objects from comma-separated-values files.
https://github.com/OneBusAway/onebusaway-csv-entities/wiki
Other
2 stars 12 forks source link

Odd error importing massive file #4

Open carlospuk opened 11 years ago

carlospuk commented 11 years ago

We're currently trying to import some very large CSV files as part of OpenTripPlanner, and have run into a strange error message from the CSV entity reader somewhere between the 60 and 70 millionth record:

2013-04-17 09:58:42,317 DEBUG [GtfsGraphBuilderImpl.java:316] : loading StopTime: 60000000
2013-04-17 09:58:51,159 WARN  [IndividualCsvEntityReader.java:110] : expected 
and actual number of csv fields differ: type=org.onebusaway.gtfs.model.StopTime 
line #1 60973114 expected=8 actual=6

Exception in thread "main" org.onebusaway.csv_entities.exceptions.CsvEntityIOException: 
io error:       entityType=org.onebusaway.gtfs.model.StopTime   
path=java.io.InputStreamReader@5232c9dd lineNumber=60973114
    at    org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:161)

I've checked the source file (stop_times.txt) at the specified line and it seems valid; there are 8 comma-delimited entities:

trip_id,arrival_time,departure_time,stop_id,stop_sequence,stop_headsign,pickup_type,drop_off_type
1,00:03:00,00:03:00,9400ZZLUEAC1,0,,0,0
...
1483950,10:38:00,10:38:00,2500LAA07503,25,,0,0
1483950,10:39:00,10:39:00,2500LAA07504,26,,0,0
1483950,10:40:00,10:40:00,2500LAA07505,27,,0,0

Anybody able to shed any light on this? Is it perhaps symptomatic of a different underlying error? I'm wondering if the stream reader was cut off prematurely for some reason, hence the parsing of this line as only 6 fields.

We're building on an Amazon EC2 High Memory instance (64Gb RAM), using Ubuntu 12.0.4 and Java 7.

bdferris commented 11 years ago

I'm not sure what's going on exactly, but I'd point out that the "expected and actual number of csv fields differ" message is just a warning and was not the actual cause of the exception. The actual cause should be listed down at the end of the Java exception chain. Any chance you include the full exception chain + stack trace?

Regardless, I'll admit that I've never tried processing such a large file. I don't know of anything specific that might be a problem, but who knows? If you can build from source, you may need to instrument with some additional debug information in the exception throwing.

carlospuk commented 11 years ago

Thanks Brian, I see what you mean about it only being a WARN. Annoyingly, I've not logged the rest of the error anywhere, so I'll have to repeat test today. That said, the start of the error message repeats the same line number (60973114) so it may well be related to the warning?

Let me run it again today and I'll see if it throws it up again and, if so, what the actual exception is.

bdferris commented 11 years ago

I agree it's probably related, but the full exception will help with debugging.

bdferris commented 9 years ago

Could you post the header line (aka the first line) of your stops.txt file?

On Thu Dec 25 2014 at 7:33:16 PM Richard Law notifications@github.com wrote:

I'm running into a similar error, albeit it doesn't like my stops.txt rather than stop_times.txt. I'm using OpenTripPlanner, too. The full exception in my console is:

16:17:08.708 WARN (IndividualCsvEntityReader.java:110) expected and actual number of csv fields differ: type=org.onebusaway.gtfs.model.Stop line # 573 expected=12 actual=14 Exception in thread "main" org.onebusaway.csv_entities.exceptions.CsvEntityIOException: io error: entityType=org.onebusaway.gtfs.model.Stop path=stops.txt lineNumber=573 at org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:161) at org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:120) at org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:115) at org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:108) at org.opentripplanner.graph_builder.impl.GtfsGraphBuilderImpl.loadBundle(GtfsGraphBuilderImpl.java:240) at org.opentripplanner.graph_builder.impl.GtfsGraphBuilderImpl.buildGraph(GtfsGraphBuilderImpl.java:170) at org.opentripplanner.graph_builder.GraphBuilderTask.run(GraphBuilderTask.java:141) at org.opentripplanner.standalone.OTPMain.main(OTPMain.java:61) Caused by: java.lang.IndexOutOfBoundsException: Index: 12, Size: 12 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.onebusaway.csv_entities.IndividualCsvEntityReader.readEntity(IndividualCsvEntityReader.java:123) at org.onebusaway.csv_entities.IndividualCsvEntityReader.handleLine(IndividualCsvEntityReader.java:96) at org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:157) ... 7 more

My source file is a (fake, based on an archived real-time information feed) file, and lines 572–574 are:

`11336,9585,FergussonDr at 105,"FergussonDr@105 ",-41.11868083,175.084223,,,0,,,

11431,2752,JamesCook at 35,"JamesCook@35",-41.11374433,174.9016455,,,0,,,`

I can't see any issue here; and indeed other feeds I've made using the same stops have had no problems using them.

Any tips for resolving this?

— Reply to this email directly or view it on GitHub https://github.com/OneBusAway/onebusaway-csv-entities/issues/4#issuecomment-68121024 .

alpha-beta-soup commented 9 years ago

Apologies, there was actually a stop with commas in the stop_name field. The line reference was unreliable (possibly using the compressed stops.txt rather than the original?), so I was looking for the problematic stop in the wrong place.