GMOD / jbrowse

JBrowse 1, a full-featured genome browser built with JavaScript and HTML5. For JBrowse 2, see https://github.com/GMOD/jbrowse-components.
http://jbrowse.org
Other
464 stars 199 forks source link

[Question] Is there a way to reduce number of files generated during indexing? #1651

Open shreyas-a-s opened 3 months ago

shreyas-a-s commented 3 months ago

I am deploying JB 1, 2 instances as part of a biological website and we are using AWS S3 as the storage provider.

The issue is that, since generatenames.pl creates a lot of small files, close to 60,000 for some of the genome data, the upload to S3 costs a lot since it counts the number of PUT commands I think.

Also I am seeing very few number of files created by the text-index command by JB2 as part of indexing.

So my question is, whether it is possible to reduce the number of files generated by generatenames.pl or any other method that I can use?

Thanks in advance.

cmdcolin commented 3 months ago

one of the motivations of jbrowse 2 was to avoid the many small files so it is indeed a bit better for this case.

for jbrowse 1 you can try using e.g. --hashBits 4 in generate-names to try to reduce number of files generated but the strategies in jbrowse 1 generally are designed to just make a lot of files.