EDIorg / EMLassemblyline

R package for creating EML metadata
https://ediorg.github.io/EMLassemblyline/
MIT License
27 stars 13 forks source link

Provide Dockerfile / image? #99

Open joefutrelle opened 2 years ago

joefutrelle commented 2 years ago

I have a CI use case and was thinking of running EMLassemblyline in Docker. Since I'm new to R on Linux, it took me a bit to figure out how to get all of the dependencies installed, but I have a working Dockerfile which I've included below. My question is if anyone else in the EDI community is interested enough in this that it would make sense to "officially" support a Dockerfile or image in Docker Hub for this package?

FROM r-base

RUN apt update
RUN apt install -y libxml2-dev libcurl4-openssl-dev libv8-dev libjq-dev

RUN Rscript -e 'install.packages("EML")'

# now install from GitHub

RUN Rscript -e 'install.packages("remotes")'
RUN Rscript -e 'remotes::install_github("EDIorg/EMLassemblyline")'

CMD R
clnsmth commented 2 years ago

Thanks @joefutrelle! I can see this being handy for EDI users and in my own work. Let me get back to you after discussing with the team (~ tomorrow).

clnsmth commented 2 years ago

Hi @joefutrelle, can you tell me a little more about your CI use case? Does it have a GitHub I can browse? Thanks!

joefutrelle commented 2 years ago

I'm not doing CI yet, but I am working towards additional automation for our EDI packaging workflow here at WHOI in the NES-LTER project and I'm thinking that Docker could be part of it. Basically we would commit templates and data, and then the assembly line would run and we'd get an exit code that would indicate whether or not the package built successfully. If it's overkill, I won't do it--but I'm investigating.

clnsmth commented 2 years ago

Hmmm ... interesting @joefutrelle. Where are you planning to execute this workflow? What kind of data volumes (approximately) will be running through it?

Integrating Docker Hub builds into the EMLassemblyline version release process will simplify maintenance, but getting this up and running may take some time given the current set of priorities and demand for this feature. Any help getting your Docker file production ready would expedite this.

Another CI solution to consider is GitHub Actions (beware of usage limits). See what @BrennieDev developed for this project. Rather than running on a schedule, your workflow would run on a push.

joefutrelle commented 2 years ago

Thanks so much. The Dockerfile can be simplified a little bit because as I understand it EMLassemblyline depends on EML so I don't have to do the step where I install EML.

I can also base it on a different image like rocker/tidyverse to accelerate the build process, haven't tried that yet.

GitHub actions are definitely one of the first CI options I'll look at, once we can actually get that far into automating our workflows.

In my copious free time I will produce a more optimized Dockerfile and will post it in this thread to serve as a starting point for this eventual enhancement.

clnsmth commented 2 years ago

Great @joefutrelle. Looking forward to the new Docker file and getting it deployed.

Please continue to suggest features/enhancements that would help with data CI, in EMLassemblyline or other EDI projects. Data CI is something I'm very interested in and working on in my "free time" : )

joefutrelle commented 2 years ago

Here's a new Dockerfile based on rocker/tidyverse which builds much faster without apparent errors. However I have not yet tested an assembly workflow in it, so it's not ready for prime time until I do that (in my free time)

FROM rocker/tidyverse

# install EMLassemblyline from GitHub

RUN Rscript -e 'install.packages("remotes")'
RUN Rscript -e 'remotes::install_github("EDIorg/EMLassemblyline")'

CMD R
joefutrelle commented 2 years ago

Running an assembly notebook failed; runtime issues with system libraries being missing will need to be addressed next.

joefutrelle commented 2 years ago

Using this package I was able to get a list of Ubuntu 20.04 system libraries required to run EMLassemblyline and my notebook worked. It's still an open question to me as to whether -dev packages are truly required at runtime (e.g., do I need libjq-dev instead of libjq1 only in the instance when I'm building jqr from source?), and this list may not be accurate, but it's a stab at it.

Is there a set of unit tests that I can run to validate that an installation of EMLassemblyline has requirements met for all of its features? That would really nail this down. In the meantime, here's the new Dockerfile:

FROM rocker/tidyverse

RUN apt update
RUN apt install -y libcurl4-openssl-dev libssl-dev pandoc zlib1g-dev libjq-dev libicu-dev libxml2-dev libv8-dev

# install EMLassemblyline from GitHub

RUN Rscript -e 'install.packages("remotes")'
RUN Rscript -e 'remotes::install_github("EDIorg/EMLassemblyline")'

CMD Rscript -e 'library(EMLassemblyline)'