cyclestreets / cyclestreets-r

An R interface to cyclestreets.net APIs
https://rpackage.cyclestreets.net/
GNU General Public License v3.0
27 stars 7 forks source link

Cannot upload geojsons with more than ~15k desire lines via web API #51

Closed Robinlovelace closed 1 year ago

Robinlovelace commented 1 year ago

Reproducible example: hope to add one soon. In the meantime any ideas @mvl22? We're using POST requests for this.

Robinlovelace commented 1 year ago

Reprex:

library(tidyverse)
library(cyclestreets)
# devtools::load_all()
# od = readRDS("~/nptscot/npt/inputdata/od_commute_jittered.Rds")
od_raw = pct::get_od()
#> No region provided. Returning national OD data.
#> Rows: 2402201 Columns: 14
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (2): Area of residence, Area of workplace
#> dbl (12): All categories: Method of travel to work, Work mainly at or from h...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 7201 Columns: 6
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): MSOA11CD, MSOA11NM
#> dbl (4): BNGEAST, BNGNORTH, LONGITUDE, LATITUDE
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
lsoas = pct::get_pct(layer = "z", national = TRUE, geography = "msoa")
#> Loading required package: sp
#> Warning in CPL_crs_from_input(x): GDAL Message 1: +init=epsg:XXXX syntax is
#> deprecated. It might return a CRS with a non-EPSG compliant axis order.
od = od_raw %>%
  slice(seq(20000))
summary(od$geo_code1 %in% lsoas)
#>    Mode   FALSE 
#> logical   20000
od = od::od_to_sf(x = od, z = lsoas)
#> 0 origins with no match in zone ids
#> 184 destinations with no match in zone ids
#>  points not in od data removed.
nrow(od) # 19k
#> [1] 19816
od_100 = od %>%
  slice(seq(100))
od_10k = od %>%
  slice(seq(10000))
od_15k = od %>%
  slice(seq(15000))

batch(desire_lines = od_100, wait = FALSE, silent = FALSE, username = "robinlovelace")
#> POSTing the request to create and start the job
#> Posting to: https://api.cyclestreets.net/v2/batchroutes.createjob?key=xxx
#> Job id: 3886
#> Sending data, wait...
#> Routing not complete
#> Routing job sent, check back in around 1 minutes if you've just sent this
#> Check at www.cyclestreets.net/journey/batch/ for route id: 3886
#> [1] 3886
batch(desire_lines = od_10k, wait = FALSE, silent = FALSE, username = "robinlovelace")
#> POSTing the request to create and start the job
#> Posting to: https://api.cyclestreets.net/v2/batchroutes.createjob?key=xxx
#> Job id: 3887
#> Sending data, wait...
#> Routing not complete
#> Routing job sent, check back in around 6 minutes if you've just sent this
#> Check at www.cyclestreets.net/journey/batch/ for route id: 3887
#> [1] 3887
batch(desire_lines = od_15k, wait = FALSE, silent = FALSE, username = "robinlovelace")
#> POSTing the request to create and start the job
#> Posting to: https://api.cyclestreets.net/v2/batchroutes.createjob?key=xxx
#> Job id:
#> Error in batch(desire_lines = od_15k, wait = FALSE, silent = FALSE, username = "robinlovelace"): Check your credentials, try again, and maybe contact CycleStreets

Created on 2023-02-25 with reprex v2.0.2

Robinlovelace commented 1 year ago

It seems that the POST request silently fails:

  res = httr::POST(url = batch_url, body = body, httr::timeout(60))
mvl22 commented 1 year ago

Could you run this and let us know an exact time it was run so we can trace this in the logs?

I suspect the API is hitting a POST size limit, which is easy to fix. The original batch API was designed for smaller amounts of data, but since the addition of the A-B mode I guess this will probably be hitting a limit.

If it's possible to supply a data file of what is posted that would help.

PS I recommend adding error handling logging in your code so that res tells you back what HTTP status code or error message you are getting.

Robinlovelace commented 1 year ago

Sure. Here is the error I'm seeing, seems it's CycleStreets side:

    "error": "The number of route calculations that would be required (112,492,500) is...
Robinlovelace commented 1 year ago

Full details in debug mode. I'll aim to send a file with the body of the request sent:

Browse[2]>   res = httr::POST(url = batch_url, body = body, httr::timeout(60))
Browse[2]> res
Response [https://api.cyclestreets.net/v2/batchroutes.createjob?key=xxx]
  Date: 2023-02-25 14:56
  Status: 200
  Content-Type: application/json; charset=UTF-8
  Size: 241 B
{
    "error": "The number of route calculations that would be required (112,492,500) is...
Robinlovelace commented 1 year ago

Heads-up @mvl22 I have emailed you a link to the JSON uploaded. It seems that the checking mechanism is calculating the number of route calculations incorrectly on your side.

mvl22 commented 1 year ago

Ah, it thinks you are sending 112 million calculations, which looks to me like 15k squared in one direction only...

You are intending 15k pre-defined lines, yes?

Robinlovelace commented 1 year ago

Yes, seems it does not realise that these are OD pairs, not points for which every combo is needed. Should be a quick fix.

Robinlovelace commented 1 year ago

You are intending 15k pre-defined lines, yes?

Correct. Same data structure and POST request works for 10k lines as shown in the reproducible example above.

Robinlovelace commented 1 year ago

In the meantime, I'll try to pull out and print the error message if there is one. Good shout on that Martin, would have made the root cause of this quicker to identify.

mvl22 commented 1 year ago

Could you also e-mail the 10k lines version?

Robinlovelace commented 1 year ago

Sure.

mvl22 commented 1 year ago

Thanks.

Sorry you've ran into this problem. We did some server upgrades last week. It's possible that a deployment default that we were silently relying on has changed.

Robinlovelace commented 1 year ago

See the GeoJSONs here: https://github.com/cyclestreets/cyclestreets-r/releases/tag/v0.6.0

mvl22 commented 1 year ago

See the GeoJSONs here: https://github.com/cyclestreets/cyclestreets-r/releases/tag/v0.6.0

The 15k example at least seems to be very long distance lines, e.g. from London to Newcastle. Just checking these are the right files? That seems to be different to the gist you sent.

Please ensure that the specific dataset you give is actually failing. Otherwise I don't have a reliable reproduce-case.

mvl22 commented 1 year ago

I've confirmed we are not seeing POST size maxouts in the logs, i.e. the sent data is being correctly received, as indeed the error you are getting implies.

So there is something about the data or the processing code that is making the system this is a set of points rather than linestrings and therefore triggering combinations mode.

Robinlovelace commented 1 year ago

See the GeoJSONs here: https://github.com/cyclestreets/cyclestreets-r/releases/tag/v0.6.0

The 15k example at least seems to be very long distance lines, e.g. from London to Newcastle. Just checking these are the right files? That seems to be different to the gist you sent.

Please ensure that the specific dataset you give is actually failing. Otherwise I don't have a reliable reproduce-case.

That was just an example dataset to show the error. Happy to provide a more realistic one.

Robinlovelace commented 1 year ago

See updated GeoJSON here: https://github.com/cyclestreets/cyclestreets-r/releases/download/v0.6.0/od_15k.geojson

mvl22 commented 1 year ago

This should now be working.

I found a bug, whereby the number of calculations in LineString mode was not done correctly. The 10k file was doing the incorrect calculation but was not triggering the calculation limit. This is now fixed.

Robinlovelace commented 1 year ago

Great to hear @mvl22, thanks for quickly identifying and fixing this issue your side.