PediatricOpenTargets / OpenPedCan-api

2 stars 7 forks source link

Update Dockerfile to start from rocker/tidyverse instead of installin… #77

Closed brianghig closed 1 year ago

brianghig commented 2 years ago

Pull Request Template

Description

Update Dockerfile to start from rocker/tidyverse:4.1.0 instead of installing tidyverse during the build process.

With the latest version of tidyverse and the previous base Docker image of rocker/r-ver:4.1.0, the Docker image build fails after ~10 minutes due to:

With this update, the Docker image builds successfully in less than 3 minutes.

Fixes # N/A

Type of change

Please delete options that are not relevant.

How Has This Been Tested?

Use the Docker Compose file to build the API web image via:

docker-compose up -d --build web

The image should build successfully and start, assuming that the database image db is also running. If not, start it via docker-compose up -d db, then open a web browser to http://localhost:8082/__docs__/ to view the Swagger documentation.

Test Configuration:

Task link/Screenshot/Terminal returns:

Checklist

logstar commented 2 years ago

@brianghig - Thank you for preparing this PR.

I was wondering if you could share the errors for building with rocker/r-ver:4.1.0.

I just ran docker-compose build --no-cache web a couple of times, and they all complete within 3 minutes.

$ docker-compose build --no-cache web
Building web
Sending build context to Docker daemon  261.6kB
Step 1/12 : FROM rocker/r-ver:4.1.0
 ---> e0f5a4f0d60f
Step 2/12 : RUN apt-get update -qq   && apt-get install -y --no-install-recommends     libssl-dev     libcurl4-gnutls-dev     curl     unixodbc     unixodbc-dev     odbc-postgresql     libx11-6     libxss1     libxt6     libxext6     libsm6     libice6     xdg-utils     libxml2   && rm -rf /var/lib/apt/lists/*   && install2.r --error     tidyverse     plumber     rprojroot     jsonlite     ggthemes     odbc     DBI     glue     pheatmap     ggpubr   && rm -rf /tmp/downloaded_packages/*
 ---> Running in 259af547ae57
Reading package lists...
Building dependency tree...
...
Successfully tagged openpedcan-api_web:latest

I ran the build command with the following docker/docker-compose versions:

docker 20.10
docker-compose 1.29.2

Changing base image to rocker/tidyverse:4.1.0 may also cause unexpected behaviors in the API R functions defined in src due to package versions, so a complete run-through of tests/run_tests.sh locally and remotely might also be necessary.

brianghig commented 2 years ago

Thanks @logstar ! I'll check out the tests as part of this PR.

Are you running on Windows or Linux? At this point, that's the only difference that makes sense.

My experience is that the web image consistently fails to build after ~10 minutes with the following output as the last lines. This is consistent with the errors at https://github.com/rocker-org/rocker/issues/335 that recommended using Rocker's tidyverse image instead of installing fully from source.

...
#5 754.6 gcc -I"/usr/local/lib/R/include" -DNDEBUG  -I'/usr/local/lib/R/site-library/cpp11/include' -I/usr/local/include  -Ireadstat -DHAVE_ZLIB -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c readstat/spss/readstat_sav.c -o readstat/spss/readstat_sav.o
#5 754.6 gcc -I"/usr/local/lib/R/include" -DNDEBUG  -I'/usr/local/lib/R/site-library/cpp11/include' -I/usr/local/include  -Ireadstat -DHAVE_ZLIB -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c readstat/spss/readstat_por.c -o readstat/spss/readstat_por.o
#5 754.7 gcc -I"/usr/local/lib/R/include" -DNDEBUG  -I'/usr/local/lib/R/site-library/cpp11/include' -I/usr/local/include  -Ireadstat -DHAVE_ZLIB -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c readstat/spss/readstat_sav_parse_timestamp.c -o readstat/spss/readstat_sav_parse_timestamp.o
#5 754.7 gcc -I"/usr/local/lib/R/include" -DNDEBUG  -I'/usr/local/lib/R/site-library/cpp11/include' -I/usr/local/include  -Ireadstat -DHAVE_ZLIB -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c readstat/spss/readstat_zsav_read.c -o readstat/spss/readstat_zsav_read.o
#5 754.7 readstat/spss/readstat_zsav_read.c:2:10: fatal error: zlib.h: No such file or directory
#5 754.7     2 | #include <zlib.h>
#5 754.7       |          ^~~~~~~~
#5 754.7 compilation terminated.
#5 754.7 make: *** [/usr/local/lib/R/etc/Makeconf:168: readstat/spss/readstat_zsav_read.o] Error 1
#5 754.7 ERROR: compilation failed for package ‘haven’
#5 754.7 * removing ‘/usr/local/lib/R/site-library/haven’
#5 754.9 * installing *source* package ‘modelr’ ...
#5 754.9 ** package ‘modelr’ successfully unpacked and MD5 sums checked
#5 754.9 ** using staged installation
#5 754.9 ** R
#5 754.9 ** data
#5 754.9 *** moving datasets to lazyload DB
#5 754.9 ** byte-compile and prepare package for lazy loading
#5 755.4 ** help
#5 755.4 *** installing help indices
#5 755.4 *** copying figures
#5 755.4 ** building package indices
#5 755.6 ** testing if installed package can be loaded from temporary location
#5 756.2 ** testing if installed package can be loaded from final location
#5 756.8 ** testing if installed package keeps a record of temporary installation path
#5 756.8 * DONE (modelr)
#5 756.9 ERROR: dependencies ‘haven’, ‘rvest’, ‘xml2’ are not available for package ‘tidyverse’
#5 756.9 * removing ‘/usr/local/lib/R/site-library/tidyverse’
#5 756.9 
#5 756.9 The downloaded source packages are in
#5 756.9        ‘/tmp/downloaded_packages’
#5 756.9 Error: installation of package ‘tidyverse’ had non-zero exit status
#5 756.9 In addition: Warning messages:
#5 756.9 1: In install.packages(pkgs, ...) :
#5 756.9   installation of package ‘xml2’ had non-zero exit status
#5 756.9 2: In install.packages(pkgs, ...) :
#5 756.9   installation of package ‘sodium’ had non-zero exit status
#5 756.9 3: In install.packages(pkgs, ...) :
#5 756.9   installation of package ‘plumber’ had non-zero exit status
#5 756.9 4: In install.packages(pkgs, ...) :
#5 756.9   installation of package ‘rvest’ had non-zero exit status
#5 756.9 5: In install.packages(pkgs, ...) :
#5 756.9   installation of package ‘haven’ had non-zero exit status
#5 756.9 6: In install.packages(pkgs, ...) :
#5 756.9   installation of package ‘tidyverse’ had non-zero exit status
------
executor failed running [/bin/sh -c apt-get update -qq   && apt-get install -y --no-install-recommends     libssl-dev     libcurl4-gnutls-dev     curl     unixodbc     unixodbc-dev     odbc-postgresql     libx11-6     libxss1     libxt6     libxext6     libsm6     libice6     xdg-utils   && rm -rf /var/lib/apt/lists/*   && install2.r --error     tidyverse     plumber     rprojroot     jsonlite     ggthemes     odbc     DBI     glue   && rm -rf /tmp/downloaded_packages/*]: exit code: 1
ERROR: Service 'web' failed to build : Build failed

I'm running on MacOS 12.5, also with:

Docker version 20.10.17, build 100c701
docker-compose version 1.29.2, build 5becea4c
logstar commented 2 years ago

@brianghig - Thank you for sharing the errors and pointers. I am not sure what caused the "#5 754.7 readstat/spss/readstat_zsav_read.c:2:10: fatal error: zlib.h: No such file or directory" error on your machine.

It is good to know that the docker file cannot be built on MacOS 12.5. I will put a note of it in the next README.md update and refer to this PR.

I built the image on ubuntu 20.04. All versions of other packages used for developing/testing this repo are listed at https://github.com/PediatricOpenTargets/OpenPedCan-api/blob/0a7046b1dedc7a7b954400edae7a45b4d60a8a98/README.md#3-test-run-openpedcan-api-server-locally.

As we will be using Amazon ubuntu EC2 to develop this repo, I think we will not need to make the repo compatible with MacOS. The database building step of the latest branch v10-dge requires about 40GB RAM.

Although https://github.com/rocker-org/rocker/issues/335 recommended to use Rocker tidyverse image, the tidyverse image have many packages and configurations that we do not need for the API HTTP server, and the unused package/configurations may have security issues at a later point. For example, tidyverse image https://github.com/rocker-org/rocker-versioned2/blob/ce8821d2090c0a88ca2875582c4946536df67614/dockerfiles/tidyverse_4.1.0.Dockerfile is built from rstudio image https://github.com/rocker-org/rocker-versioned2/blob/master/dockerfiles/rstudio_4.1.0.Dockerfile, and rstudio image is built from r-ver image.

I was wondering if you could try building the web image with an ubuntu virtual machine. If it works, would it be ok to temporarily develop/test on an ubuntu virtual machine before you have access to ubuntu EC2?

As I do not have docker on my Mac, I was wondering if you could also try adding libz-dev in the apt-get install list to see whether the web image can be built on MacOS afterwards. Installing libz-dev may fix the "zlib.h: No such file or directory" error, according to https://stackoverflow.com/questions/36374267/how-to-fix-fatal-error-zlib-h-no-such-file-or-directory.

If we still need to make changes to the web image docker file, we will also need to check deployment details with @devbyaccident , who had set up the DEV/QA/PRD deployment procedures.

jharenza commented 1 year ago

@logstar it seems we can close this for now?

logstar commented 1 year ago

Hi @jharenza. Thank you for checking. I think this can be closed.