PondiB / openeocubes

A lightweight R-based RESTful service to analyze Earth Observation data cubes in the cloud.
Apache License 2.0
30 stars 9 forks source link

Safety Issue with "run_udf"!!! #81

Closed MichaelBrueggemann closed 8 months ago

MichaelBrueggemann commented 9 months ago

@PondiB

URGENT ISSUE !!!

In the current implementation, the Process "run_udf" takes an arbitrary string with a user-defined-function. This String isn't checked but rather is directly parsed into an expression by base::parse() and is then directly evaluated with base::eval().

This is a significant safety hazard for the operating system currently running an istance of "openeocubes". With this process ones could not only provide functions as intended by openEO but can also run any R-Code. In the following example i have created an UDF that uses base::system() to execute a shell command by the operating system. With this i was able to create a C-Program and also compile and execute this. This could be an potential entrypoint for Malware and other people with questionable intend (clone a git repo with malware code and execute it).

I recommend implementing some kind of safety, as to restrict the kind of R-functions that could be passed (e.g. forbid the use of system() and other similar functions) or rather remove this functionately entirely, until this issue is resolved.

Example Code:

con = openeo::connect("http://localhost:8000")
openeo::login(user = "user",
              password = "password")
p = openeo::processes()

# just some sample data to retrieve a datacube
xmin = 854523.1986700408
ymin = 6797063.516360302
xmax = 857831.1917130196
ymax = 6799315.301182906

bands = c("B02")

datacube_init1 = p$load_collection(id = "sentinel-s2-l2a-cogs",
                                   spatial_extent = list(west= xmin ,
                                                         south= ymin,
                                                         east= xmax,
                                                         north= ymax),
                                   crs = 3857,
                                   temporal_extent = c("2022-06-01", "2022-06-30"),
                                   bands = bands,
                                   resolution = 30)

  udf = 'function(x){system("cmd /C echo #include ^<stdio.h^> > test.c & echo int main(void){return 10;} >> test.c & gcc -o test.exe test.c & test.exe")
  return(x["B02"])}'

  udf_cube = p$run_udf(data = datacube_init1, udf = udf)

  formats = openeo::list_file_formats()

  result1 = p$save_result(data = udf_cube , format = formats$output$GTiff)

  # save file to local storage
  openeo::compute_result(graph = result1, output_file = "test.tif")

I recommend trying this with a local instance of openeocubes (with startLocal.R) and see the issue for yourself

MichaelBrueggemann commented 9 months ago

A very simple solution would be to add this code to run_udf():

            # check, if udf string contains forbidden keywords
            forbidden_keywords = c("system", "Sys.", "processx")

            if (any(sapply(forbidden_keywords, grepl, udf)))
            {
              message("Forbidden keyword used!")
              stop()
            }
PondiB commented 9 months ago

@MichaelBrueggemann , nice suggestion. kindly create a branch of the current repo and update the UDF function then send a pull request.

PondiB commented 8 months ago

@MichaelBrueggemann , Taken care of.

goergen95 commented 8 months ago

@mikemahoney218 has come up with a nice solution for a similar problem in {rsi} including a whitelist instead of a blacklist which should be much safer. Maybe this can be expandable by users via an option?

https://github.com/Permian-Global-Research/rsi/blob/25e2dd293270767b720db1e46d6f6d49ad3049dd/R/calculate_indices.R#L77-L137

PondiB commented 8 months ago

https://github.com/Permian-Global-Research/rsi/blob/25e2dd293270767b720db1e46d6f6d49ad3049dd/R/calculate_indices.R#L77-L137

@goergen95 , Thanks for pointing out this alternative. I'll refactor with that approach.