StaPH-B / docker-builds

:package: :whale: Dockerfiles and documentation on tools for public health bioinformatics
GNU General Public License v3.0
191 stars 119 forks source link

[Container Request]: dorado but without models included, maybe? #1073

Open kapsakcj opened 1 month ago

kapsakcj commented 1 month ago

What tool are you looking for and why?

We just deployed a dorado docker image in PR #1051 , but this docker image contains all available dorado models so the docker image is quite large at ~8.5 GB compressed and ~15.7GB uncompressed.

I don't have a need for this (as of now!), but perhaps someone else does? We could considering building a docker image that does not contain any model files. This would reduce the size of the docker image (though still somewhat large due to the size of the nvidia base image).

The user could then download the appropriate dorado model at runtime, as dorado has the capability to do this.

Is there anyone that would use a model-free docker image for dorado?

dpark01 commented 1 month ago

So, it's worth noting that docker pull of an 8.5GB compressed image is much slower than docker pull of a 4GB image immediately followed by a download of 4GB of data (ie using normal workflow engine type machinery to localize large files from cloud buckets). This is because docker pull is rate limited by the single-threaded gzip decompression of its layers more so than the network bandwidth of wherever you're downloading from.

kapsakcj commented 1 month ago

good to know, thanks Danny. Even more of a reason to make a model-less dorado docker image 👍