Open ansoncfit opened 3 years ago
This issue is of particular importance to people working on multi-variable accessibility. Although doing this within the web interface is possible, it represent a whole new complex feature (#705). Batch downloads are an expedient way to facilitate computing multiple-variable accessibility in external tools like QGIS.
Currently evaluating whether it's realistic to include this option in the next release (low disruptive potential).
A script that may be useful until this issue is addressed: https://github.com/conveyal/batch-isochrones/blob/master/batch-result-download.ipynb
I have been looking into this again, here are some thoughts.
In order to download the entire batch, we’d need to either: A) enumerate all the types of results associated with this regional analysis and generate a name for each one; B) consult some kind of existing catalog of which results exist for each one.
We already have a partial form of B stored in RegionalAnalysis.resultStorage, but only for CSV results. An alternative way to derive something like B could be to look at all pre-existing files in the “results” storage category whose names begin with the specified regional analysis ID. However, our storage interface doesn’t have a method to list existing files using wildcards. This could be added, but providing open-ended directory listings deviates from the existing model and their absence can be considered something of an extra security layer. Secondly, like many other places in R5 we derive geotiff rasters on demand, only when requested. Only the geotiffs that have been individually requested and generated will exist in storage, and getting a ZIP containing only things you’ve already downloaded is not compatible with the main use case (download everything immediately after an analysis completes).
The problem with A is that all existing endpoints are designed to receive cutoff and destination parameters for one single raster, with layers of logic to derive the filenames accumulated as formats shifted historically. It is probably possible to factor out the various name-generating logic and call it from a loop over all the known parameters for a particular analysis. This could then be followed by a step that zips all the accumulated results into one. This final ZIP could be retained and returned as a cloud storage direct download URL, but it then creates one more place subject to the problem in #673.
Considering all the above, A would probably be used for the geotiff rasters, possibly together with B to include any CSV results in the zip.
I have a working prototype providing the core functionality:
$ curl -v "http://localhost:7070/api/regional/65b3de3c0b0b886ba9e54126/all"
* Trying 127.0.0.1:7070...
* Connected to localhost (127.0.0.1) port 7070 (#0)
> GET /api/regional/65b3de3c0b0b886ba9e54126/all HTTP/1.1
> Host: localhost:7070
> User-Agent: curl/8.1.2
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Fri, 26 Jan 2024 17:27:40 GMT
< Access-Control-Allow-Origin: http://localhost:3000
< Vary: Origin
< Content-Type: application/json
< Content-Disposition: attachment; filename="REGIONAL-MULTI.zip"
< Transfer-Encoding: chunked
< Server: Jetty(9.4.8.v20171121)
<
* Connection #0 to host localhost left intact
{"url":"http://localhost:7070/files/results/65b3de3c0b0b886ba9e54126_ALL.zip","name":"REGIONAL-MULTI.zip"}%
$ wget "http://localhost:7070/files/results/65b3de3c0b0b886ba9e54126_ALL.zip"
--2024-01-27 01:28:03-- http://localhost:7070/files/results/65b3de3c0b0b886ba9e54126_ALL.zip
Resolving localhost (localhost)... 127.0.0.1
Connecting to localhost (localhost)|127.0.0.1|:7070... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: ‘65b3de3c0b0b886ba9e54126_ALL.zip’
2024-01-27 01:28:03 (766 MB/s) - ‘65b3de3c0b0b886ba9e54126_ALL.zip’ saved [77149]
$ unzip -l 65b3de3c0b0b886ba9e54126_ALL.zip
Archive: 65b3de3c0b0b886ba9e54126_ALL.zip
Length Date Time Name
--------- ---------- ----- ----
3591 01-27-2024 01:27 65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P5_C20.tif
3477 01-27-2024 01:27 65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P25_C20.tif
3405 01-27-2024 01:27 65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P50_C20.tif
3357 01-27-2024 01:27 65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P75_C20.tif
3331 01-27-2024 01:27 65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P95_C20.tif
3990 01-27-2024 01:27 65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P5_C30.tif
3829 01-27-2024 01:27 65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P25_C30.tif
3765 01-27-2024 01:27 65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P50_C30.tif
3658 01-27-2024 01:27 65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P75_C30.tif
3534 01-27-2024 01:27 65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P95_C30.tif
4167 01-27-2024 01:27 65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P5_C45.tif
4150 01-27-2024 01:27 65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P25_C45.tif
4119 01-27-2024 01:27 65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P50_C45.tif
4032 01-27-2024 01:27 65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P75_C45.tif
3978 01-27-2024 01:27 65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P95_C45.tif
4258 01-27-2024 01:27 65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P5_C60.tif
4210 01-27-2024 01:27 65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P25_C60.tif
4225 01-27-2024 01:27 65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P50_C60.tif
4186 01-27-2024 01:27 65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P75_C60.tif
4176 01-27-2024 01:27 65b3de3c0b0b886ba9e54126_655d8388a990e8133f0c7579_P95_C60.tif
--------- -------
77438 20 files
Updated to replace analysis and destination pointset UUIDs with human readable names. After running a regional analysis called Name!! &&\\/\\/..//with bad(()()(chars
having three destination sets (two of which have the same name), four percentiles, and three cutoffs:
$ curl "http://localhost:7070/api/regional/65b5260868ae26587877e881/all"
{"url":"http://localhost:7070/files/results/65b5260868ae26587877e881_ALL.zip","name":"Name_with_bad_chars.zip"}%
$ wget http://localhost:7070/files/results/65b5260868ae26587877e881_ALL.zip -O "Name_with_bad_chars.zip"
2024-01-27 23:52:24 (707 MB/s) - ‘Name_with_bad_chars.zip’ saved [45242]
$ unzip -l Name_with_bad_chars.zip
Archive: Name_with_bad_chars.zip
Length Date Time Name
--------- ---------- ----- ----
1496 01-27-2024 23:51 Name_with_bad_chars_D0_whole_P20_C22.tif
1466 01-27-2024 23:51 Name_with_bad_chars_D0_whole_P40_C22.tif
1438 01-27-2024 23:51 Name_with_bad_chars_D0_whole_P60_C22.tif
1405 01-27-2024 23:51 Name_with_bad_chars_D0_whole_P80_C22.tif
1660 01-27-2024 23:51 Name_with_bad_chars_D0_whole_P20_C33.tif
1632 01-27-2024 23:51 Name_with_bad_chars_D0_whole_P40_C33.tif
1596 01-27-2024 23:51 Name_with_bad_chars_D0_whole_P60_C33.tif
1578 01-27-2024 23:51 Name_with_bad_chars_D0_whole_P80_C33.tif
1651 01-27-2024 23:51 Name_with_bad_chars_D0_whole_P20_C44.tif
1660 01-27-2024 23:51 Name_with_bad_chars_D0_whole_P40_C44.tif
1672 01-27-2024 23:51 Name_with_bad_chars_D0_whole_P60_C44.tif
1684 01-27-2024 23:51 Name_with_bad_chars_D0_whole_P80_C44.tif
629 01-27-2024 23:51 Name_with_bad_chars_D1_whole_P20_C22.tif
591 01-27-2024 23:51 Name_with_bad_chars_D1_whole_P40_C22.tif
572 01-27-2024 23:51 Name_with_bad_chars_D1_whole_P60_C22.tif
558 01-27-2024 23:51 Name_with_bad_chars_D1_whole_P80_C22.tif
688 01-27-2024 23:51 Name_with_bad_chars_D1_whole_P20_C33.tif
659 01-27-2024 23:51 Name_with_bad_chars_D1_whole_P40_C33.tif
651 01-27-2024 23:51 Name_with_bad_chars_D1_whole_P60_C33.tif
637 01-27-2024 23:51 Name_with_bad_chars_D1_whole_P80_C33.tif
712 01-27-2024 23:51 Name_with_bad_chars_D1_whole_P20_C44.tif
698 01-27-2024 23:51 Name_with_bad_chars_D1_whole_P40_C44.tif
689 01-27-2024 23:51 Name_with_bad_chars_D1_whole_P60_C44.tif
664 01-27-2024 23:51 Name_with_bad_chars_D1_whole_P80_C44.tif
1442 01-27-2024 23:51 Name_with_bad_chars_D2_VAC10_P20_C22.tif
1435 01-27-2024 23:51 Name_with_bad_chars_D2_VAC10_P40_C22.tif
1429 01-27-2024 23:51 Name_with_bad_chars_D2_VAC10_P60_C22.tif
1413 01-27-2024 23:51 Name_with_bad_chars_D2_VAC10_P80_C22.tif
1479 01-27-2024 23:51 Name_with_bad_chars_D2_VAC10_P20_C33.tif
1470 01-27-2024 23:51 Name_with_bad_chars_D2_VAC10_P40_C33.tif
1457 01-27-2024 23:51 Name_with_bad_chars_D2_VAC10_P60_C33.tif
1470 01-27-2024 23:51 Name_with_bad_chars_D2_VAC10_P80_C33.tif
1537 01-27-2024 23:51 Name_with_bad_chars_D2_VAC10_P20_C44.tif
1529 01-27-2024 23:51 Name_with_bad_chars_D2_VAC10_P40_C44.tif
1526 01-27-2024 23:51 Name_with_bad_chars_D2_VAC10_P60_C44.tif
1522 01-27-2024 23:51 Name_with_bad_chars_D2_VAC10_P80_C44.tif
--------- -------
44395 36 files
Thanks, these multi-grid-download changes look promising.
If preparing multiple single cutoff grids takes substantial time, should we consider tracking this operation as an activity?
There's no need to implement this for CSV downloads (which are only available for freeform origins, not grid origins).
After running a single regional analysis with multiple destination pointsets, cutoffs, and percentiles, some users may want to download a .zip file containing all of the corresponding .geotiffs.
Separate, but related: an option to download the raster showing a comparison between two analyses, which would free users from having to perform raster subtraction as a post-processing step. We expect users to be comfortable loading/styling a raster if they want custom cartography, but some may not be familiar with raster operations. And in any case, the option to download the difference between two analyses straight away would be a time saver for us (especially before #conveyal/analysis-ui#472 is addressed).