Closed rpetit3 closed 6 years ago
Hi @rpetit3,
Thanks for the report! Sorry for the delay in getting it fixed.
build
will rebuild the index from scratch so
bigsi build test-bigsi test3.bloom -s s3
Should be replaced with:
bigsi build test-bigsi test1.bloom test2.bloom test3.bloom -s s1 -s s2 -s s3 --force 1
OR
bigsi insert test-bigsi test3.bloom s3
The insert command wasn't working at cabebc7 but I've pushed a fix here: https://github.com/Phelimb/BIGSI/commit/d4b6f8e44693312ace3461592cd432be643776f1
Hey @Phelimb
Thanks for the update! Tested everything out, and insert is working like a charm now! By any chance do you have any recommendations on creating a 10k+ sample database? It will be a single species database, so not much difference in kmers.
Currently I'm testing, build then insert samples one at a time. I haven't tested inserting multiples ata time. Thought I would ask before I go further.
Thanks again for the update!
So, it depends slightly on your compute resources. Provided these samples are bacterial and ~5Mbp in size then the default parameters will work. You'll need ~280GB to build the index in memory (http://www.wolframalpha.com/input/?i=10,000*3.5MB*8) or ~35GB if you replace the transpose method in https://github.com/Phelimb/BIGSI/blob/master/bigsi/matrix/transpose.py with the one that doesn't use numpy (it's slower but uses less memory).
If you don't have a 300/40GB mem machine I would suggest building in chunks as long as possible and then merging the resulting indexes. The merge command is currently not working unfortunately but it simply iterates through all the rows in the berkeleyDB and concatenates them.
I have my PhD viva this week so I'll fix the merge command next week and write up a tutorial on how to build the index with the split/merge approach.
When updating an existing database, a search query will only return hits to the first query. The new sample is recognized as existing in the database.
Below is an example using the test data. I'm running BIGSI at the latest commit (cabebc7857e99d34549fc7b78eceae62c349884e) on an Ubuntu 16.04 machine with Python 3.6.3. Please let me know if you need any more details.