actinia-org / actinia-core

Actinia Core is an open source REST API for scalable, distributed, high performance processing of geographical data that uses mainly GRASS GIS for computational tasks (DOI: https://doi.org/10.5281/zenodo.5879231) | Tutorial: https://actinia-org.github.io/actinia-core/ | Docker: https://hub.docker.com/r/mundialis/actinia-core
https://actinia.mundialis.de/
GNU General Public License v3.0
78 stars 23 forks source link

Enable import via vsicurl #482

Closed mmacata closed 9 months ago

mmacata commented 11 months ago

This PR enables the user to import online raster resources via vsicurl via importer. In comparison to the current raster import it skips the wget step and passes the URL directly to r.import. The vsicurl URL can directly be specified in the process chain:

    {
      "id": "importer_1",
      "module": "importer",
      "inputs": [
        {
          "import_descr": {
            "source": "/vsicurl/https://github.com/giswqs/data/raw/main/raster/srtm90.tif",
            "type": "raster",
            "resample": "bilinear_f",
            "resolution": "value",
            "resolution_value": "100"
          },
          "param": "map",
          "value": "mangkawuk_srtmgl1_v003_30m"
        }
      ]
    },

Output before without vsicurl URL:

"process_log": [
    {
      "executable": "/usr/bin/wget",
      "id": "importer_wget_myraster.tiff",
      "mapset_size": 421,
      "parameter": [
        "-t5",
        "-c",
        "-q",
        "-O",
        "/actinia_core/workspace/temp_db/gisdbase_aab038cc892a4eb199b3e3251ea51b9a/.tmp/myraster.tiff",
        "https://raw.githubusercontent.com/mmacata/pagestest/gh-pages/myraster.tiff"
      ],
      "return_code": 0,
      "run_time": 4.188900709152222,
      "stderr": [
        ""
      ],
      "stdout": ""
    },
    {
      "executable": "r.import",
      "id": "r_import_myraster.tiff",
      "mapset_size": 421,
      "parameter": [
        "input=/actinia_core/workspace/temp_db/gisdbase_aab038cc892a4eb199b3e3251ea51b9a/.tmp/myraster.tiff",
        "output=myraster",
        "--q",
        "resample=bilinear_f",
        "resolution=value",
        "resolution_value=100"
      ],
      "return_code": 1,
      "run_time": 1.5571587085723877,
      "stderr": [
        "WARNING: Invalid east longitude -246.534",
        "WARNING: Unable to determine area of interest for '+proj=lcc +lat_1=36.16666666666666 +lat_2=34.33333333333334 +lat_0=33.75 +lon_0=-79 +x_0=609601.22 +y_0=0 +no_defs +a=6378137 +rf=298.257222101 +towgs84=0.000,0.000,0.000 +type=crs '",
        "ERROR: Unable to initialize coordinate transformation",
        "ERROR: Unable to reproject to source location",
        ""
      ],
      "stdout": ""
    }
  ],

This is still possible. But now with vsicurl:

"process_log": [
    {
      "executable": "r.import",
      "id": "r_import_myraster.tif",
      "mapset_size": 24187896,
      "parameter": [
        "input=/vsicurl/https://github.com/giswqs/data/raw/main/raster/srtm90.tif",
        "output=myraster",
        "--q",
        "resample=bilinear_f",
        "resolution=value",
        "resolution_value=100"
      ],
      "return_code": 0,
      "run_time": 45.551167488098145,
      "stderr": [
        ""
      ],
      "stdout": ""
    }
  ],

These changes were tested with superadmin and normal user. The users limits were decreased to show the effect. For only 1 allowed pixel, v.import runs through but then the next step fails with "message": "AsyncProcessError: Region too large, set a coarser resolution to minimum nsres: 343448.918425 ewres: 343448.918425 [num_cells: 11781980]",. This might be critical as a large image could do harm to a running actinia installation? What do you think @anikaweinmann @neteler ? With only one second allowed for calculation, actinia behaves as it should: "message": "AsyncProcessTimeLimit: Time (1 seconds) exceeded to run executable r.import",.

anikaweinmann commented 11 months ago

Is it possible to set something by the importer so that extent=region is used? So that only parts of large data sets will be imported.

mmacata commented 11 months ago

I guess it would be easy to add it here. But then it would be a breaking change for actinia, no? If the importer behaves differently. Either way we decide I think it would be good to have the same behaviour for imports via wget and for those with vsicurl.

ninsbl commented 11 months ago

As a heads-up: /vsicurl/ uses (or at least may use) userfaultfd syscalls that will fail in a docker container, unless that capability is added to the container /service with seccomp / capadd. Ran into that recently myself and have not properly solved that yet, but currently looking for a good solution for our stack deployment...

mmacata commented 11 months ago

As a heads-up: /vsicurl/ uses (or at least may use) userfaultfd syscalls that will fail in a docker container, unless that capability is added to the container /service with seccomp / capadd. Ran into that recently myself and have not properly solved that yet, but currently looking for a good solution for our stack deployment...

Interesting. When I tested it locally with vscode / docker dev setup, all went fine! Used mundialis/actinia:latest and started actinia as root user there.

anikaweinmann commented 11 months ago

I guess it would be easy to add it here. But then it would be a breaking change for actinia, no? If the importer behaves differently. Either way we decide I think it would be good to have the same behaviour for imports via wget and for those with vsicurl.

If we add a parameter which can be set but have not be added then it would not be a breaking change, or?

ninsbl commented 11 months ago

Interesting. When I tested it locally with vscode / docker dev setup, all went fine! Used mundialis/actinia:latest and started actinia as root user there.

The issue I encountered seems to be format dependent. I tested your Geotiff and that works fine for me too. NetCDF however, does not seem to work...

So, in a docker container, this will fail: gdalinfo /vsicurl/https://nbstds.met.no/thredds/fileServer/NBS/S2A/2023/09/26/S2A_MSIL1C_20230926T134931_N0509_R110_T31XEG_20230926T154938.nc

neteler commented 11 months ago

Issue confirmed also with podman:

podman run -it --rm osgeo/grass-gis:releasebranch_8_3-debian bash
root@94aa9494148b:/grassdb# gdalinfo /vsicurl/https://nbstds.met.no/thredds/fileServer/NBS/S2A/2023/09/26/S2A_MSIL1C_20230926T134931_N0509_R110_T31XEG_20230926T154938.nc
ERROR 1: CPLCreateUserFaultMapping(): syscall(__NR_userfaultfd) failed: insufficient permission. add CAP_SYS_PTRACE capability, or set /proc/sys/vm/unprivileged_userfaultfd to 1
gdalinfo failed - unable to open '/vsicurl/https://nbstds.met.no/thredds/fileServer/NBS/S2A/2023/09/26/S2A_MSIL1C_20230926T134931_N0509_R110_T31XEG_20230926T154938.nc'.
ninsbl commented 11 months ago

OK, how I plan to tackle the issue in our swarm deployment (actinia-swarm.yml) is to add permissions / capabilities like this:

cap_add:
  - SYS_PTRACE

And here with context...

---
version: "3.1"
services:
  actinia:
    cap_add:
      - SYS_PTRACE
    environment:

With that addition /vsicurl/ works in docker also with netcdf...

My knowledge of Unix system admin stuff does not go deep enough to know if there is an option to allow only userfaultfd. A web search produced this: https://lore.kernel.org/lkml/CAJHvVcgbCL7+4bBZ_5biLKfjmz_DKNBV8H6NxcLcFrw9Fbu7mw@mail.gmail.com/ But I have no idea if that patch was merged.

mmacata commented 10 months ago

Map pixel limit check needed:

mmacata commented 9 months ago