Phelimb / BIGSI

BItsliced Genomic Signature Index - Efficient indexing and search in very large collections of WGS data
http://www.bigsi.io
MIT License
124 stars 13 forks source link

Max memory must be at least 8 * Bloomfilter size in bytes #68

Open rica01 opened 3 years ago

rica01 commented 3 years ago

Hello everyone. I am looking forward to try BIGSI, but I came up with this error while trying to build the bloom filters:

(sepsis_py_venv) (base) 
15:33:51 || ~/sepsis/BIGSI/config
[ricardo@bart]$ ls -lisah
total 508K
41563339  17K drwxrwxr-x 2 ricardo ricardo    7 Aug 13 15:11 .
41496240  17K drwxrwxr-x 9 ricardo ricardo   24 Aug 13 12:29 ..
41563342 8.5K -rw-rw-r-- 1 ricardo ricardo  179 Aug 13 12:30 bdb.yaml
41568979 8.5K -rw-rw-r-- 1 ricardo ricardo 1.4K Aug 13 15:11 test1.ctx
41568980 8.5K -rw-rw-r-- 1 ricardo ricardo 1.4K Aug 13 15:11 test2.ctx
(sepsis_py_venv) (base) 
15:33:56 || ~/sepsis/BIGSI/config
[ricardo@bart]$ bigsi build --config ./bdb.yaml bloom ./test1.ctx  ./test1.bloom
Traceback (most recent call last):
  File "//mnt/compgen/homes/ricardo/sepsis/sepsis_py_venv/bin/bigsi", line 8, in <module>
    sys.exit(main())
  File "/mnt/compgen/homes/ricardo/sepsis/sepsis_py_venv/lib/python3.6/site-packages/bigsi/__main__.py", line 324, in main
    API.cli()
  File "/mnt/compgen/homes/ricardo/sepsis/sepsis_py_venv/lib/python3.6/site-packages/hug/api.py", line 441, in __call__
    result = self.commands.get(command)()
  File "/mnt/compgen/homes/ricardo/sepsis/sepsis_py_venv/lib/python3.6/site-packages/hug/interface.py", line 650, in __call__
    raise exception
  File "/mnt/compgen/homes/ricardo/sepsis/sepsis_py_venv/lib/python3.6/site-packages/hug/interface.py", line 646, in __call__
    result = self.output(self.interface(**pass_to_function), context)
  File "/mnt/compgen/homes/ricardo/sepsis/sepsis_py_venv/lib/python3.6/site-packages/hug/interface.py", line 129, in __call__
    return __hug_internal_self._function(*args, **kwargs)
  File "/mnt/compgen/homes/ricardo/sepsis/sepsis_py_venv/lib/python3.6/site-packages/bigsi/__main__.py", line 170, in build
    max_memory=max_memory_bytes,
  File "/mnt/compgen/homes/ricardo/sepsis/sepsis_py_venv/lib/python3.6/site-packages/bigsi/cmds/build.py", line 53, in build
    raise ValueError("Max memory must be at least 8 * Bloomfilter size in bytes")
ValueError: Max memory must be at least 8 * Bloomfilter size in bytes
(sepsis_py_venv) (base) 

I am running on a machine with 256 GB of RAM so, I am guessing that is not the problem, but I am not sure...

Any advice you could provide will be appreciated!

-Ricardo

rica01 commented 3 years ago

Hello! I tried again on a computer with 512 GB of RAM. and still the same problem happens.

leoisl commented 3 years ago

Dear Ricardo, Sorry for the delay. Could we please see the config at bdb.yaml? cheers

rica01 commented 3 years ago

sure!


## Example config using berkeleyDB
h: 1
k: 31
m: 28000000
storage-engine: berkeleydb
storage-config:
  filename: test-berkeleydb
  flag: "c" ## Change to 'r' for read-only access
iqbal-lab commented 3 years ago

ping @leoisl

leoisl commented 3 years ago

Sorry for the delay, I could reproduce this issue on the tip of master, it seems to be an issue with the CLI not warning that the params are not correct. I think you are looking to build bloom filters, then you would need to run the bigsi bloom command instead. I am sorry this repo is a bit outdated, the official one now is https://github.com/iqbal-lab-org/BIGSI . There you can find some guides, for example https://bigsi.readme.io/docs/your-first-bigsi , that shows the whole workflow, from extracting kmers to querying your bigsi index.

Please tell me if this is helpful or if I understood incorrectly this issue.

cheers

rica01 commented 3 years ago

what do u refer to the parameters not being correct?

what would be the correct ones?

I will use the code on this new repo you pointed to but if you can reproduce the error, I guess I will run into the same problems.

Just a more general question. Is this software still being used? or is this a dead end for me, and I should look for something else?

On Mon, Sep 6, 2021 at 4:11 PM leoisl @.***> wrote:

Sorry for the delay, I could reproduce this issue on the tip of master, it seems to be an issue with the CLI not warning that the params are not correct. I think you are looking to build bloom filters, then you would need to run the bigsi bloom command instead. I am sorry this repo is a bit outdated, the official one now is https://github.com/iqbal-lab-org/BIGSI . There you can find some guides, for example https://bigsi.readme.io/docs/your-first-bigsi , that shows the whole workflow, from extracting kmers to querying your bigsi index.

Please tell me if this is helpful or if I understood incorrectly this issue.

cheers

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Phelimb/BIGSI/issues/68#issuecomment-913639515, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACWQWE45J6NT4FB6P2VIV3LUAS4WRANCNFSM5CDQ5G3Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

leoisl commented 3 years ago

Dear @rica01 ,

what do u refer to the parameters not being correct?

If I understood correctly, the commands are trying to create a test bigsi index to query it. Taking as example the example data that comes with bigsi, if we go to the root of the source code, we can run the following commands to build a bigsi index and query it:

  1. Build a bloom filter from the first cortex file: bigsi bloom --config example-data/configs/berkeleydb.yaml example-data/test1.ctx example-data/test1.bloom;
  2. Build a bloom filter from the second cortex file: bigsi bloom --config example-data/configs/berkeleydb.yaml example-data/test2.ctx example-data/test2.bloom;
  3. Insert the bloom filters into the bigsi index: bigsi build --config example-data/configs/berkeleydb.yaml example-data/test1.bloom example-data/test2.bloom -s s1 -s s2
  4. Query the index, e.g.: bigsi search --config example-data/configs/berkeleydb.yaml CGGCGAGGAAGCGTTAAATCTCTTTCTGACG . You should get as answer:
    {'query': 'CGGCGAGGAAGCGTTAAATCTCTTTCTGACG', 'threshold': 1.0, 'results': [{'percent_kmers_found': 100.0, 'num_kmers': 1, 'num_kmers_found': 1, 'sample_name': 's1'}], 'citation': 'http://dx.doi.org/10.1038/s41587-018-0010-1'}

The specific parameters I don't understand from the previous command (bigsi build --config ./bdb.yaml bloom ./test1.ctx ./test1.bloom) are bloom and ./test1.ctx, as per the specification, bigsi build receives only a config file, sample names and bloom filters files:

 bigsi build -h
usage: bigsi-v0.3.1 build [-h] [-s SAMPLES] [-c CONFIG] [bloomfilters [bloomfilters ...]]

positional arguments:
  bloomfilters          Multiple Values

optional arguments:
  -h, --help            show this help message and exit
  -s SAMPLES, --samples SAMPLES
                        Multiple Values
  -c CONFIG, --config CONFIG
                        Basic text / string value

I will use the code on this new repo you pointed to but if you can reproduce the error, I guess I will run into the same problems.

When I try your command line on the new repo, I get the following issue:

FileNotFoundError: [Errno 2] No such file or directory: '.../git/BIGSI/bloom/bloom'

I would recommend you to follow the previously stated commands, or to follow this guide; and to use the official repo: https://github.com/iqbal-lab-org/BIGSI . If the issues persists, please don't hesitate on contacting us.

Just a more general question. Is this software still being used? or is this a dead end for me, and I should look for something else?

It is still used in some projects in our group. I can't say much external to our group, as I don't know. You can also try COBS, which has many similarities with BIGSI, but aims at being more efficient. Besides these two, you also have several other options of DNA/protein index and search tools

iqbal-lab commented 3 years ago

We're transitioning to using COBS (https://github.com/iqbal-lab-org/cobs) which is faster and uses less disk - see https://arxiv.org/abs/1905.09624