StatCan / datascience-cookiecutter

A Cookiecutter template for Data Science Projects in Python
MIT License
7 stars 1 forks source link

Sharing Dockerfile to test deployment. #38

Closed asolis closed 1 year ago

asolis commented 2 years ago

@goatsweater could you please share the current docker file that you are using ? (i.e., the one you showed me using the wheel)

Maybe we end up creating a version of this file inside the repository as me talked, but for now I just wanted to start testing it with other projects.

Thanks.

goatsweater commented 2 years ago

I will grab the file for you. I happened to stumble across the templates from the cloudnative team as well. They're on k8s gitlab under /cloudnative/onboarding/kubernetes-application-templates.

My Dockerfile is slightly different than their Python example, mostly in my use of a non-root user.

goatsweater commented 2 years ago

Here is the contents of one of my Dockerfiles:

FROM python:3.10-slim as builder

# Create the workspace
WORKDIR /app

ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1

RUN python -m venv /app/venv
ENV PATH /app/venv/bin:$PATH

COPY . ./

# Install build support and build a source distribution of our code
RUN pip --disable-pip-version-check install --no-cache-dir build
RUN python -m build --sdist

FROM python:3.10-slim

# Create a space to work in
WORKDIR /app

ENV VIRTUAL_ENV=/app/venv
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"

RUN addgroup --gid 1001 --system app && \
    adduser --shell /bin/false --disabled-password --uid 1001 --system --group app && \
    chown -R app:app /app

USER app

# Copy our python package into the container
ENV PKG_VERSION=1.1.0
COPY --from=builder /app/dist/sdmxparquet-${PKG_VERSION}.tar.gz /app

# Install the package, which will install dependencies as well
RUN pip --disable-pip-version-check install --no-cache-dir sdmxparquet-${PKG_VERSION}.tar.gz

ENTRYPOINT [ "sdmxparquet" ]
CMD [ ]

I do recommend using different values for the uid, gid, and app dir. It has no implementation impact, but it makes it hard to talk about with others when everything has the same name.

goatsweater commented 2 years ago

Based on some recent experience in one of my projects I've begun to deviate from this template for a few different reasons, but mostly because I've found it isn't necessary to have multistage builds in practice, and this file incurs a maintenance burden that is a source of build failures.

By leveraging the CI stages directly to build a package it can be copied into the container as long the CI marks it as an artifact. This makes the Dockerfile easier to maintain by removing the entire build stage, and removing the need to maintain the package version information. It also makes the build artifact available to other stages (like tests).

I need to put a bit of effort into generalizing it so that it can be brought into this project. I'll try to get to that soon.

goatsweater commented 1 year ago

Closing this as gitlab CI has been integrated into the templates