elixir-cloud-aai / TESK

GA4GH Task Execution Service Root Project + Deployment scripts on Kubernetes
https://tesk.readthedocs.io
Apache License 2.0
39 stars 28 forks source link

chore: only one Dockerfile #176

Open JaeAeich opened 1 month ago

JaeAeich commented 1 month ago

Currently we have 2 dockerfile for filer and taskmaster, and with implementation of API we will also have another. This might be pointless as we are packaging all the components of tesk. Which means we will have to install tesk in all of the images and the only change that will be there will be entry point.

I porpose that we have one image that installs tesk, and pass the entrypoint as console script as while using running the image as container. checkout current poetry console script:

[tool.poetry.scripts]
filer = 'tesk.services.filer:main'
taskmaster = 'tesk.services.taskmaster:main'
api = 'tesk.api.app:main'  <-- added after api impl

The .Dockerfile in deployment/comtainer has

###################################################
#   Stage 1: Build wheel                           #
###################################################
FROM python:3.11-alpine AS builder

# Set work directory
WORKDIR /app

# Install poetry
RUN pip install poetry

# Copy source code
COPY . .

# Build wheel
RUN poetry build -f wheel

###################################################
#   Stage 2: Install wheel and create user        #
###################################################
FROM python:3.11-alpine AS runner

# Copy built wheel from the builder stage
COPY --from=builder /app/dist/*.whl /dist/

# Install the application with dependencies
RUN pip install /dist/*.whl

# Create a non-root user
RUN adduser -D -u 1000 filerUser

# Switch to the non-root user
USER filerUser

# Set the working directory
WORKDIR /app

# Entrypoint command
ENTRYPOINT ["filer"]  <-- pass as env

The only diff between both the image is the Entrypoint.

PS: This might be blocked till foca still needs Docker to be able to run.

JaeAeich commented 1 month ago

@uniqueg opinions?

uniqueg commented 1 month ago

I agree that it doesn't make sense to keep different images if they all have the same content. And as long as that is the case, I think the solution you propose is much better.

However, for performance and also security reasons, it might be even better to make sure that every image only has what it really needs. I just don't see why a pod running the taskmaster would ever have (or even need to know about) the API part.

This approach would also have the added benefit of really forcing us to establish and maintain a low coupling design of the service, because for each piece of code we would need to think hard about which image or images it should end up in.

JaeAeich commented 1 month ago

I really don't think this would impact the performance, unless ofc the api code is relatively very large than that of services. Since taskmaster image will call taskmaster entrypoint, it runtime will only be importing files that it needs which are consolidated in services dir, so on and so forth for the filer and api as well. But I think we can only be sure after we have benchmarked them ig. Lets see if the api Dockerfile has the same content, there wouldn't even be a point of discussion as that will just be redundancy.

PS: I think we can create dep groups for teskmaster, filer and api in pyproject and conditionally have entrypoint and specific code in each image still using only single Dockerfile, will have to see how that will work though, but I think that should be easy :).

uniqueg commented 1 month ago

My main concerns are not performance, image size or even build time (those could indeed be minimized fairly easily), but rather the other two points I mentioned: security (every line of code may be an attack surface) and enforcing low coupling (encourages cleaner design, which is again an aspect of security by design).

The thing is that TESK is (and likely will be) predominantly used in domains where data security and privacy are of extreme importance (hence the projects that try to operationalize Crypt4GH and confidential computing through TESK). In a production environment that may potentially deal with patient data, people will have an extremely close look at the code of any components that end up dealing with such data directly, and so it makes perfect sense to keep concerns as separate as possible, not least because it might make audits and certifications a bit ~easier~ less painful).