ITSLeeds / UK2GTFS

Convert UK transport data (TransXchange / ATOC CIF) to GTFS format in R
https://itsleeds.github.io/UK2GTFS/
GNU General Public License v3.0
37 stars 13 forks source link

Processing Network Rail CIF daily updates fails #63

Open danielchick opened 7 months ago

danielchick commented 7 months ago

I am attempting to use the UK2GTFS package to generate GTFS files for the NR Daily updates files. The processing of the full weekly update works well, however if I concatenate daily update data into the full CIF file, or try to process the daily update files individually (using the nr2gtfs function), I encounter the following error:

2024-02-06 14:15:41.77603 Building calendar and calendar_dates 2024-02-06 14:15:41.778934 Constructing calendar and calendar_dates Error in checkForRemoteErrors(val) : one node produced an error: missing value where TRUE/FALSE needed Calls: gtfs_write ... clusterApplyLB -> dynamicClusterApply -> checkForRemoteErrors Execution halted

I'm struggling to debug this issue, and was looking for any assistance! Alternatively is there a better way of processing the daily updates through modifying the code - for example being able to specify multiple input files to the nr2gtfs function to be processed sequntially?

e.g. nr2gtfs( path_in = ["/var/NetworkRailCIFToGTFS/data/toc-full.CIF.gz","/var/NetworkRailCIFToGTFS/data/toc-update-1.CIF.gz"] silent = FALSE, ncores = 2, full_import = TRUE )

Attached is an example of the CIF update file I have been trying to process.

toc-full.CIF.gz

Clearly it would be helpful if UK2GTFS were able to process daily updates - particularly with all the volatility in the timetable at the moment with the Industrial Action

Many thanks

Daniel Chick Zipabout

danielchick commented 5 months ago

Can anybody help with this? My R skills aren't great - so any assistance on how to resolve this issue would be appreciated!

mem48 commented 4 months ago

I found the error

Error in if (all(calendar.sub.day$STP == "C")) { : 
  missing value where TRUE/FALSE needed

> calendar_split[[14]]
       UID start_date   end_date    Days STP rowID Headcode ATOC Code Retail Train ID Train Status duration
481 H03943 2024-04-24       <NA>    <NA>   O  5717     <NA>      <NA>            <NA>         <NA>  NA days
482 H03943 2024-04-24 2024-04-25 0011000   C  5718     <NA>      <NA>            <NA>         <NA>   2 days
483 H03943 2024-04-25       <NA>    <NA>   O  5719     <NA>      <NA>            <NA>         <NA>  NA days
mem48 commented 4 months ago

The problem is that the end_date and Days variables are missing in the data, this is causing the code to crash