conveyal / r5

Developed to power Conveyal's web-based interface for scenario planning and land-use/transport accessibility analysis, R5 is our routing engine for multimodal (transit/bike/walk/car) networks with a particular focus on public transit
https://conveyal.com/learn
MIT License
279 stars 72 forks source link

Options to download multiple regional analyses #610

Open ansoncfit opened 3 years ago

ansoncfit commented 3 years ago

After running a single regional analysis with multiple destination pointsets, cutoffs, and percentiles, some users may want to download a .zip file containing all of the corresponding .geotiffs.

Separate, but related: an option to download the raster showing a comparison between two analyses, which would free users from having to perform raster subtraction as a post-processing step. We expect users to be comfortable loading/styling a raster if they want custom cartography, but some may not be familiar with raster operations. And in any case, the option to download the difference between two analyses straight away would be a time saver for us (especially before #conveyal/analysis-ui#472 is addressed).

abyrd commented 2 years ago

This issue is of particular importance to people working on multi-variable accessibility. Although doing this within the web interface is possible, it represent a whole new complex feature (#705). Batch downloads are an expedient way to facilitate computing multiple-variable accessibility in external tools like QGIS.

Currently evaluating whether it's realistic to include this option in the next release (low disruptive potential).

ansoncfit commented 1 year ago

A script that may be useful until this issue is addressed: https://github.com/conveyal/batch-isochrones/blob/master/batch-result-download.ipynb

abyrd commented 6 months ago

I have been looking into this again, here are some thoughts.

In order to download the entire batch, we’d need to either: A) enumerate all the types of results associated with this regional analysis and generate a name for each one; B) consult some kind of existing catalog of which results exist for each one.

We already have a partial form of B stored in RegionalAnalysis.resultStorage, but only for CSV results. An alternative way to derive something like B could be to look at all pre-existing files in the “results” storage category whose names begin with the specified regional analysis ID. However, our storage interface doesn’t have a method to list existing files using wildcards. This could be added, but providing open-ended directory listings deviates from the existing model and their absence can be considered something of an extra security layer. Secondly, like many other places in R5 we derive geotiff rasters on demand, only when requested. Only the geotiffs that have been individually requested and generated will exist in storage, and getting a ZIP containing only things you’ve already downloaded is not compatible with the main use case (download everything immediately after an analysis completes).

The problem with A is that all existing endpoints are designed to receive cutoff and destination parameters for one single raster, with layers of logic to derive the filenames accumulated as formats shifted historically. It is probably possible to factor out the various name-generating logic and call it from a loop over all the known parameters for a particular analysis. This could then be followed by a step that zips all the accumulated results into one. This final ZIP could be retained and returned as a cloud storage direct download URL, but it then creates one more place subject to the problem in #673.

Considering all the above, A would probably be used for the geotiff rasters, possibly together with B to include any CSV results in the zip.

abyrd commented 6 months ago

I have a working prototype providing the core functionality:

$ curl -v "http://localhost:7070/api/regional/65b3de3c0b0b886ba9e54126/all"
*   Trying 127.0.0.1:7070...
* Connected to localhost (127.0.0.1) port 7070 (#0)
> GET /api/regional/65b3de3c0b0b886ba9e54126/all HTTP/1.1
> Host: localhost:7070
> User-Agent: curl/8.1.2
> Accept: */*
> 
< HTTP/1.1 200 OK
< Date: Fri, 26 Jan 2024 17:27:40 GMT
< Access-Control-Allow-Origin: http://localhost:3000
< Vary: Origin
< Content-Type: application/json
< Content-Disposition: attachment; filename="REGIONAL-MULTI.zip"
< Transfer-Encoding: chunked
< Server: Jetty(9.4.8.v20171121)
< 
* Connection #0 to host localhost left intact
{"url":"http://localhost:7070/files/results/65b3de3c0b0b886ba9e54126_ALL.zip","name":"REGIONAL-MULTI.zip"}%                                                                 

$ wget "http://localhost:7070/files/results/65b3de3c0b0b886ba9e54126_ALL.zip"
--2024-01-27 01:28:03--  http://localhost:7070/files/results/65b3de3c0b0b886ba9e54126_ALL.zip
Resolving localhost (localhost)... 127.0.0.1
Connecting to localhost (localhost)|127.0.0.1|:7070... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: ‘65b3de3c0b0b886ba9e54126_ALL.zip’
2024-01-27 01:28:03 (766 MB/s) - ‘65b3de3c0b0b886ba9e54126_ALL.zip’ saved [77149]

$ unzip -l 65b3de3c0b0b886ba9e54126_ALL.zip 
Archive:  65b3de3c0b0b886ba9e54126_ALL.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
     3591  01-27-2024 01:27   65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P5_C20.tif
     3477  01-27-2024 01:27   65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P25_C20.tif
     3405  01-27-2024 01:27   65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P50_C20.tif
     3357  01-27-2024 01:27   65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P75_C20.tif
     3331  01-27-2024 01:27   65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P95_C20.tif
     3990  01-27-2024 01:27   65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P5_C30.tif
     3829  01-27-2024 01:27   65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P25_C30.tif
     3765  01-27-2024 01:27   65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P50_C30.tif
     3658  01-27-2024 01:27   65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P75_C30.tif
     3534  01-27-2024 01:27   65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P95_C30.tif
     4167  01-27-2024 01:27   65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P5_C45.tif
     4150  01-27-2024 01:27   65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P25_C45.tif
     4119  01-27-2024 01:27   65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P50_C45.tif
     4032  01-27-2024 01:27   65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P75_C45.tif
     3978  01-27-2024 01:27   65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P95_C45.tif
     4258  01-27-2024 01:27   65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P5_C60.tif
     4210  01-27-2024 01:27   65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P25_C60.tif
     4225  01-27-2024 01:27   65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P50_C60.tif
     4186  01-27-2024 01:27   65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P75_C60.tif
     4176  01-27-2024 01:27   65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P95_C60.tif
---------                     -------
    77438                     20 files
abyrd commented 6 months ago

Updated to replace analysis and destination pointset UUIDs with human readable names. After running a regional analysis called Name!! &&\\/\\/..//with bad(()()(chars having three destination sets (two of which have the same name), four percentiles, and three cutoffs:

$ curl "http://localhost:7070/api/regional/65b5260868ae26587877e881/all"
{"url":"http://localhost:7070/files/results/65b5260868ae26587877e881_ALL.zip","name":"Name_with_bad_chars.zip"}%        

$ wget http://localhost:7070/files/results/65b5260868ae26587877e881_ALL.zip -O "Name_with_bad_chars.zip"
2024-01-27 23:52:24 (707 MB/s) - ‘Name_with_bad_chars.zip’ saved [45242]

$ unzip -l Name_with_bad_chars.zip 
Archive:  Name_with_bad_chars.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
     1496  01-27-2024 23:51   Name_with_bad_chars_D0_whole_P20_C22.tif
     1466  01-27-2024 23:51   Name_with_bad_chars_D0_whole_P40_C22.tif
     1438  01-27-2024 23:51   Name_with_bad_chars_D0_whole_P60_C22.tif
     1405  01-27-2024 23:51   Name_with_bad_chars_D0_whole_P80_C22.tif
     1660  01-27-2024 23:51   Name_with_bad_chars_D0_whole_P20_C33.tif
     1632  01-27-2024 23:51   Name_with_bad_chars_D0_whole_P40_C33.tif
     1596  01-27-2024 23:51   Name_with_bad_chars_D0_whole_P60_C33.tif
     1578  01-27-2024 23:51   Name_with_bad_chars_D0_whole_P80_C33.tif
     1651  01-27-2024 23:51   Name_with_bad_chars_D0_whole_P20_C44.tif
     1660  01-27-2024 23:51   Name_with_bad_chars_D0_whole_P40_C44.tif
     1672  01-27-2024 23:51   Name_with_bad_chars_D0_whole_P60_C44.tif
     1684  01-27-2024 23:51   Name_with_bad_chars_D0_whole_P80_C44.tif
      629  01-27-2024 23:51   Name_with_bad_chars_D1_whole_P20_C22.tif
      591  01-27-2024 23:51   Name_with_bad_chars_D1_whole_P40_C22.tif
      572  01-27-2024 23:51   Name_with_bad_chars_D1_whole_P60_C22.tif
      558  01-27-2024 23:51   Name_with_bad_chars_D1_whole_P80_C22.tif
      688  01-27-2024 23:51   Name_with_bad_chars_D1_whole_P20_C33.tif
      659  01-27-2024 23:51   Name_with_bad_chars_D1_whole_P40_C33.tif
      651  01-27-2024 23:51   Name_with_bad_chars_D1_whole_P60_C33.tif
      637  01-27-2024 23:51   Name_with_bad_chars_D1_whole_P80_C33.tif
      712  01-27-2024 23:51   Name_with_bad_chars_D1_whole_P20_C44.tif
      698  01-27-2024 23:51   Name_with_bad_chars_D1_whole_P40_C44.tif
      689  01-27-2024 23:51   Name_with_bad_chars_D1_whole_P60_C44.tif
      664  01-27-2024 23:51   Name_with_bad_chars_D1_whole_P80_C44.tif
     1442  01-27-2024 23:51   Name_with_bad_chars_D2_VAC10_P20_C22.tif
     1435  01-27-2024 23:51   Name_with_bad_chars_D2_VAC10_P40_C22.tif
     1429  01-27-2024 23:51   Name_with_bad_chars_D2_VAC10_P60_C22.tif
     1413  01-27-2024 23:51   Name_with_bad_chars_D2_VAC10_P80_C22.tif
     1479  01-27-2024 23:51   Name_with_bad_chars_D2_VAC10_P20_C33.tif
     1470  01-27-2024 23:51   Name_with_bad_chars_D2_VAC10_P40_C33.tif
     1457  01-27-2024 23:51   Name_with_bad_chars_D2_VAC10_P60_C33.tif
     1470  01-27-2024 23:51   Name_with_bad_chars_D2_VAC10_P80_C33.tif
     1537  01-27-2024 23:51   Name_with_bad_chars_D2_VAC10_P20_C44.tif
     1529  01-27-2024 23:51   Name_with_bad_chars_D2_VAC10_P40_C44.tif
     1526  01-27-2024 23:51   Name_with_bad_chars_D2_VAC10_P60_C44.tif
     1522  01-27-2024 23:51   Name_with_bad_chars_D2_VAC10_P80_C44.tif
---------                     -------
    44395                     36 files
ansoncfit commented 6 months ago

Thanks, these multi-grid-download changes look promising.

If preparing multiple single cutoff grids takes substantial time, should we consider tracking this operation as an activity?

There's no need to implement this for CSV downloads (which are only available for freeform origins, not grid origins).