MobilityData / gtfs-validator

Canonical GTFS Validator project for schedule (static) files.
https://gtfs-validator.mobilitydata.org/
Apache License 2.0
288 stars 101 forks source link

"otherCalendar" is null #397

Closed mcplanner-zz closed 3 years ago

mcplanner-zz commented 4 years ago

Describe the bug The validator was unable to process a feed and gave an error.

To Reproduce Run the validator on the following dataset: http://transitfeeds.com/p/sonoma-marin-area-rail-transit/1050/latest

Expected behavior No notice should be generated in this case.

Witnessed behavior

Exception in thread "main" java.lang.NullPointerException: Cannot invoke "org.mobilitydata.gtfsvalidator.domain.entity.gtfs.Calendar.getStartDate()" because "otherCalendar" is null
    at org.mobilitydata.gtfsvalidator.domain.entity.gtfs.Calendar.isOverlapping(Calendar.java:500)
    at org.mobilitydata.gtfsvalidator.usecase.ValidateNoOverlappingStopTimeInTripBlock.checkOverlappingTripsDifferentServiceIdCalendarProvided(ValidateNoOverlappingStopTimeInTripBlock.java:260)
    at org.mobilitydata.gtfsvalidator.usecase.ValidateNoOverlappingStopTimeInTripBlock.lambda$execute$0(ValidateNoOverlappingStopTimeInTripBlock.java:128)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
    at org.mobilitydata.gtfsvalidator.usecase.ValidateNoOverlappingStopTimeInTripBlock.lambda$execute$1(ValidateNoOverlappingStopTimeInTripBlock.java:84)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
    at org.mobilitydata.gtfsvalidator.usecase.ValidateNoOverlappingStopTimeInTripBlock.lambda$execute$2(ValidateNoOverlappingStopTimeInTripBlock.java:69)
    at java.base/java.util.HashMap.forEach(HashMap.java:1425)
    at java.base/java.util.Collections$UnmodifiableMap.forEach(Collections.java:1521)
    at org.mobilitydata.gtfsvalidator.usecase.ValidateNoOverlappingStopTimeInTripBlock.execute(ValidateNoOverlappingStopTimeInTripBlock.java:67)
    at org.mobilitydata.gtfsvalidator.Main.main(Main.java:216)

Environment used

Additional context The calendar.txt file doesn't use the service_name field, but all other required fields appear to be present.

barbeau commented 4 years ago

@mcplanner This GTFS dataset is a first for me - it's the first time I've seen a mix between the two different calendar models in the same GTFS dataset for the same agency.

I've always viewed the GTFS spec as supporting two calendar models - a normalized Model A that the best practices advocate for (everything in calendar.txt with exceptions/holidays called out in calendar_dates.txt, but all trips in trips.txt referencing calendar.txt), and a denormalized Model B (where all date instances for service are defined in calendar_dates.txt and therefore all trips in trips.txt reference calendar_dates.txt).

It looks like this dataset has a hybrid of both approaches, with service_id 73343 following the Model A normalized approach (with a few exceptions to 73343 in calendar_dates.txt), and service_id 73344 following the denormalized approach of adding service outside of any pattern defined in calendar.txt:

calendar.txt

service_id,monday,tuesday,wednesday,thursday,friday,saturday,sunday,start_date,end_date
73343,1,1,1,1,1,0,0,20200318,20210101

calendar_dates.txt

service_id,date,exception_type
73343,20201225,2
73343,20201127,2
73343,20201126,2
73343,20200907,2
73344,20201225,1
73344,20201127,1
73344,20201126,1
73344,20200907,1

Talking to a few others, though, it seems like this practice is generally accepted as valid GTFS?

The spec isn't clear about this - it would be nice if it was clarified to say this was allowed. It seems odd to me, in that it breaks traditional data modeling practices. In the hybrid scenario, for the table calendar_dates.txt for some records we treat service_id as the primary key, and for others we treat service_id as a foreign key (to calendar.txt). Typically we'd define the role of a field for primary/foreign keys at the table level, not at the record level.

mcplanner-zz commented 4 years ago

Trillium built the feed. I wonder if they can comment on why it was constructed that way, and how common this construction is?

antrim commented 4 years ago

That is Trillium's general practice. These exceptions occur on weekdays where regular weekday service is replaced with weekend/holiday service:

Agency website: https://www.sonomamarintrain.org/schedules-fares

This data is accepted as valid by the 3rd party apps that receive Trillium data. I think this is a good way of representing exceptions, because it corresponds to how the agency is representing exceptions to customers (service is replaced on a holiday).

Here is another example of this practice: https://transitfeeds.com/p/arcata-mad-river-transit-system/148/latest/file/calendar_dates.txt

An aside: The publisher of the SMART (Sonoma Marin Area Rail Transit) feed is MTC/511.org but we assume the original data source is indeed Trillium. We last pushed data to 511.org on 08/24/2020.

barbeau commented 4 years ago

Thanks @antrim for that explanation. Other datasets that I've seen have an entry in calendar.txt for the weekend service, and then have an entry in calendar_dates.txt to add this service on the holiday. But the above makes sense to me if you have one-off service on holidays that don't fit a pattern (i.e., isn't the same as normal weekday service).

barbeau commented 3 years ago

This was fixed by https://github.com/MobilityData/gtfs-validator/pull/423