bliksemlabs / rrrr

RRRR rapid real-time routing
BSD 2-Clause "Simplified" License
164 stars 32 forks source link

Errors while parsing GTFS on tdata branch. Difference between branches? #203

Open mr-tm opened 1 year ago

mr-tm commented 1 year ago

Hi! First of all thank you for creating this library, from the description it truly seems like it's one of a kind! Couldn't believe it's possible to compress all netherland transit info to just 17MB from 246MB GTFS zip file!

I'm having an error while parsing GTFS file(http://gtfs.ovapi.nl/gtfs-nl.zip) on tdata branches(tdata4, tdata-cherokee):

saving transfer stops (footpaths) at position 9177772 in output [8.75 MB] saving transfer times (footpaths) at position 9263792 in output [8.83 MB] Traceback (most recent call last): File "gtfs2rrrr.py", line 154, in main() File "gtfs2rrrr.py", line 151, in main exporter.timetable4.export(tdata) File "C:\Projects\rrrr\rrrr-tdata4\rrtimetable\rrtimetable\exporter\timetable4.py", line 662, in export export_transfers(tdata,index,out) File "C:\Projects\rrrr\rrrr-tdata4\rrtimetable\rrtimetable\exporter\timetable4.py", line 328, in export_transfers writeshort(out,(int(transfer_time) >> 2)) TypeError: int() argument must be a string or a number, not 'NoneType'

I also noticed the processing with gtfs2rrrr.py is much longer (until the error) than the master. Also gtfs2rrrr.py, before the error hit, took about 11GB of ram.

With master it was parsing just fine. (gtfsdb.py -> transfers.py -> timetable.py)

Which leads to the question - why are there so many different branches and which one would be the best for mobile deployment?

Thanks!

skinkie commented 1 year ago

@mr-tm there are still many branches because we had a lot of experiments, including some that would change all data structures to check performance (column stores, different types of memory management). I would say try tdata-cherokee.

We have mostly generated timetable via a script that directly accesses a database (and already has timedemandtypes). I am not surprised that it eats so much memory on raw gtfs files, and to be fair: I don't think GTFS is the best input to achieve good routing, mainly some preprocessing on stops and their relations towards eachother is required (for examples the transfer times between stops within a station).

With respect to mobile deployment, ten years ago we did it with JNI ;) If you have more questions feel free to ask them.

mr-tm commented 1 year ago

@skinkie thank you for your insight! Glad to hear it worked with JNI! I managed to compile tdata-cherokee and test router through ./cli on MAC - so far, so good. I was wondering if there is option to point how many itineraries I'd like to receive? For example, get 4 best itineraries.

Also, is it possible to get this library working with multiple timetable.dat files? For example, I have timetable.dat for city transports and timetable.dat for intercity buses and I'd like to use both files to create itinerary.

skinkie commented 1 year ago

@skinkie thank you for your insight! Glad to hear it worked with JNI! I managed to compile tdata-cherokee and test router through ./cli on MAC - so far, so good. I was wondering if there is option to point how many itineraries I'd like to receive? For example, get 4 best itineraries.

The algorithm (RAPTOR) allow you to retreive the fastest times given N amount of transfers. Hence the only way to receive mulitple itineraries in one go is when there is actually a faster option available given more transfers.

I am currently working on rRAPTOR (range raptor) which allows at virtually no cost to get a range of itineraries in a certain time range. That will be available in this branch.

Also, is it possible to get this library working with multiple timetable.dat files? For example, I have timetable.dat for city transports and timetable.dat for intercity buses and I'd like to use both files to create itinerary.

No, that will not - by design - work. It is easier to merge the input (for example GTFS, see "one bus away transformer"). Theoretically you can merge timetable.dat too, but it would need some serious management.

mr-tm commented 1 year ago

@skinkie

I am currently working on rRAPTOR (range raptor) which allows at virtually no cost to get a range of itineraries in a certain time range. That will be available in this branch.

Can't wait! :D

I'm currently trying to get this working on Android, some times it works fine, but sometimes I'm having art_sigsegv_fault error somewhere in plan_render_otp method. I can't seem to understand why it's failing there. I tried debugging and it randomly fails somewhere in that function. (for the same parameters)

I tried studying cli.c file to mimick function calling, maybe have I missed something?

This is for opening timetable4.dat

OPResult openTimetable(string &path)
{
  LOGI("[openTimetable] Opening timetable: %s", path.c_str());
  /* initialise the structs so we can always trust NULL values */
  memset (&tdata,    0, sizeof(tdata_t));
  memset (&router,   0, sizeof(router_t));

  if (!tdata_load(&tdata, const_cast<char*>(path.c_str()))) {
    return OPResult{
      .type = Error,
      .errorMessage = "Could not load tdata!"
    };
  }

  if (! tdata_hashgrid_setup(&tdata)) {
    return OPResult{
      .type = Error,
      .errorMessage = "Could not setup hashgrid!"
    };
  }
  if (!router_setup(&router, &tdata)) {
    return OPResult{
      .type = Error,
      .errorMessage = "Could not setup router!"
    };
  }

  return OPResult{
    .type = Ok,
    .errorMessage = "Router initialized!"
  };
}

This is for getting plan (MMAP, but also DYNAMIC seems to fail too)

#define OUTPUT_LEN 32000
OPResult getPlan(const double fromLat, const double fromLong, const double toLat, const double toLong, time_t time, bool arriveBy, string &resultOtpJson){
  router_request_initialize(&req);
  plan_init(&plan);
  router_request_from_epoch(&req, &tdata, time);
  req.arrive_by = arriveBy;
  if (req.arrive_by) {
    req.time_cutoff = 0;
  } else {
    req.time_cutoff = UNREACHED;
  }
  //std::time_t ms = std::time(nullptr);
  req.from_latlon.lat = (float)fromLat;
  req.from_latlon.lon = (float)fromLong;

  req.to_latlon.lat = (float)toLat;
  req.to_latlon.lon = (float)toLong;

  req.intermediatestops = true;
  LOGI("[getPlan] Navigating from: (%f, %f) to (%f, %f). arriveBy(%s), time(%ld)", req.from_latlon.lat, req.from_latlon.lon, req.to_latlon.lat, req.to_latlon.lon, req.arrive_by ? "true" : "false", time);

  if (req.time_rounded && ! (req.arrive_by)) {
    req.time++;
  }
  req.time_rounded = false;

  if (!router_route_full_reversal(&router, &req, &plan)) {
    return OPResult{
      .type = Error,
      .errorMessage = "Could not navigate!"
    };
  }

  char result_buf[OUTPUT_LEN];
  plan.req = req;
  plan_render_otp(&plan, &tdata, result_buf, OUTPUT_LEN);
  resultOtpJson = string(result_buf);
  return OPResult{
     .type = Ok
  };
}

And this is for closing timetable

void closeTimetable(){
  LOGI("[closeTimetable] Closing timetable...");
    /* Deallocate the scratchspace of the router */
  router_teardown(&router);

  /* Deallocate the hashgrid coordinates */
  tdata_hashgrid_teardown(&tdata);

  /* Unmap the memory and/or deallocate the memory on the heap */
  tdata_close(&tdata);
}

image The last crash happened here. stop_index value was 65535 which is out of bounds for stop_point_coords array so coords were some random values. image

This is the request:

from_stop_area:  NONE [65535]
from_stop_point:  NONE [65535]
from_latlon:  56.988089,23.879145
to_stop_area:    NONE [65535]
to_stop_point:    NONE [65535]
to_latlon:  56.957226,23.627903
date:  2023-05-20
time:  21:32:28 [40987]
speed: 1.500000 m/sec
arrive-by: true
max xfers: 1
max time:  20:25:56
mode: 
transit

I also noticed that OTP json seems to be faulty from coord to coord request when it does not crash. Eg.

        "from": {
            "name": "NONE",
            "stopId": {
                "agencyId": "NL",
                "id": "Z:218"
            },
            "stopCode": null,
            "platformCode": null,
            "lat": 0,
            "lon": 0,
            "wheelchairBoarding": null,
            "visualAccessible": null,
            "arrival": null,
            "departure": null
        },
        "to": {
            "name": "NONE",
            "stopId": {
                "agencyId": "NL",
                "id": "Z:218"
            },
            "stopCode": null,
            "platformCode": null,
            "lat": 0,
            "lon": 0,
            "wheelchairBoarding": null,
            "visualAccessible": null,
            "arrival": null,
            "departure": null
        },

Sometimes lat, lon are random values.

mr-tm commented 1 year ago

Full OTP json example. full_otp.zip

skinkie commented 1 year ago

I don't know if I am able to 'support' these kind of debugging requests, but I'll see what I can do.

mr-tm commented 1 year ago

@skinkie No problem. I think the issue here is that NONE stops are not correctly handled in plan_render_otp.c. I added quick and dirty fix to some methods so they don't access memory they should not(in case of stop - 'NONE') and it looks like the crash is not happening anymore. :)

skinkie commented 1 year ago

@mr-tm I am accepting pull requests if you have some fixes ;)