dib-lab / khmer

In-memory nucleotide sequence k-mer counting, filtering, graph traversal and more
http://khmer.readthedocs.io/
Other
757 stars 295 forks source link

Nodegraph starting size issue -- #1776

Closed ctb closed 7 years ago

ctb commented 7 years ago

I'm trying to create a Nodegraph with starting size 8000000000, and getting the following error:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/t/dev/spacegraphcats/spacegraphcats/build_contracted_dbg.py", line 25, in <module>
    run(args)
  File "/Users/t/dev/spacegraphcats/spacegraphcats/walk_dbg.py", line 141, in run   
    graph = khmer.Nodegraph(args.ksize, graph_tablesize, 2)
  File "khmer/_oxli/graphs.pyx", line 859, in khmer._oxli.graphs.Nodegraph.__cinit__ (khmer/_oxli/graphs.cpp:15489)
OverflowError: value too large to convert to int

which seems strange on multiple levels, but there we are.

Same error occurs with:

load-graph.py -M 8e9 xxx.ng akker-reads.abundtrim.gz

Isn't int 64-bit??

Here are the lines at issue:

https://github.com/dib-lab/khmer/blob/ceaebd37c22d6528dd6ae0f13142633dfb5582c3/khmer/_oxli/graphs.pyx#L857-L860

ctb commented 7 years ago

(Interesting to note that we apparently don't test large Nodegraph size creation anywhere :)

luizirber commented 7 years ago

We do test large Nodegraph size, but only if running with the huge mark (see https://github.com/dib-lab/khmer/blob/ceaebd37c22d6528dd6ae0f13142633dfb5582c3/tests/test_nodegraph.py#L57 and https://github.com/dib-lab/khmer/blob/ceaebd37c22d6528dd6ae0f13142633dfb5582c3/Makefile#L82 ). Should we make another test with something more reasonable instead of 1e13 ?

And int size depends on the arch, but in x86_64 it is 32 bits. We should use a uint64_t just to be safe.