Closed ImagoXV closed 3 months ago
I don't think this is a good idea.
Git is meant to manage code/text. GitHub offers to host (small) binaries as a service for users, but it make sense to block large binaries. You could store this huge binary on another service, like zenodo, but given that this is something that could be generated by a script, it can be seen as a waste of resources and bandwidth.
Is there a way to reduce the size of your final binary? Better file compression maybe? or a smaller container?
I don't know yet. I find it pretty small knowing that
/database/ containing SILVA.gz and Index is already 14Gb large. I still have a SILVA_taxonomy file that is just fasta header, I could get rid of it and parse the fasta.gz file instead, that would free some space.
I could get rid of the .tar.gz archive of the downloaded from source softwares. Cummulated they make 4Mb
Maybe I could use a smaller Ubunbtu release, maybe something else, relly light to loose some weight
Maybe I should check the install("phyloseq", dependencies = TRUE)
which is absurdly long and probably heavy too.
I was told that Zenodo was not meant to host software binaries, but I already made a draft record to store it there.
And yeah, for sure we can build from source, but it's too long and I don't k now why, but bwa-mem2
uses more memory for silva indexing than bwa
. This kills my 32Gb 12cores computer for indexing, I already openend an issue on bwa-mem2
repo, but no answer yet.
I think that bwa-mem2
index should be lighter too.
/bin/ is 84Mb large /etc/ is ~1Mb /opt/ is 13Mb /run/ is 1.5Kb /usr/ is 1.7Gb, pretty large if you ask me. Probably a bunch of useless junk in there. /var/ is 59Mb
I think I can make some cleaning
If you have any idea, please let me know.
I totally agree that I need a smaller container, I'll check that right away
I see stuff like this, I'll give it a try
RUN apt-get update \
&& apt-get install -y --no-install-recommends ubuntu-minimal \
&& rm -rf /var/lib/apt/lists/*
Hi @frederic-mahe, I tried to make a release to see how it works. However, my binary is too big (~5Gb). Max allowed is 2Gb.
Any idea on how to overcome this ?
Arthur