Open JaeAeich opened 1 month ago
@uniqueg opinions?
I agree that it doesn't make sense to keep different images if they all have the same content. And as long as that is the case, I think the solution you propose is much better.
However, for performance and also security reasons, it might be even better to make sure that every image only has what it really needs. I just don't see why a pod running the taskmaster would ever have (or even need to know about) the API part.
This approach would also have the added benefit of really forcing us to establish and maintain a low coupling design of the service, because for each piece of code we would need to think hard about which image or images it should end up in.
I really don't think this would impact the performance, unless ofc the api
code is relatively very large than that of services
. Since taskmaster
image will call taskmaster
entrypoint, it runtime will only be importing files that it needs which are consolidated in services
dir, so on and so forth for the filer
and api
as well.
But I think we can only be sure after we have benchmarked them ig. Lets see if the api Dockerfile
has the same content, there wouldn't even be a point of discussion as that will just be redundancy.
PS: I think we can create dep groups for teskmaster
, filer
and api
in pyproject
and conditionally have entrypoint and specific code in each image still using only single Dockerfile
, will have to see how that will work though, but I think that should be easy :).
My main concerns are not performance, image size or even build time (those could indeed be minimized fairly easily), but rather the other two points I mentioned: security (every line of code may be an attack surface) and enforcing low coupling (encourages cleaner design, which is again an aspect of security by design).
The thing is that TESK is (and likely will be) predominantly used in domains where data security and privacy are of extreme importance (hence the projects that try to operationalize Crypt4GH and confidential computing through TESK). In a production environment that may potentially deal with patient data, people will have an extremely close look at the code of any components that end up dealing with such data directly, and so it makes perfect sense to keep concerns as separate as possible, not least because it might make audits and certifications a bit ~easier~ less painful).
Currently we have 2 dockerfile for filer and taskmaster, and with implementation of API we will also have another. This might be pointless as we are packaging all the components of
tesk
. Which means we will have to installtesk
in all of the images and the only change that will be there will be entry point.I porpose that we have one image that installs
tesk
, and pass the entrypoint as console script as while using running the image as container. checkout current poetry console script:The.Dockerfile in deployment/comtainer has
The only diff between both the image is the Entrypoint.
PS: This might be blocked till foca still needs Docker to be able to run.