gmarcais / Jellyfish

A fast multi-threaded k-mer counter
Other
471 stars 136 forks source link

Jellyfish python bindings don't support fasta or specify what format they do support #60

Open MatthewRalston opened 8 years ago

MatthewRalston commented 8 years ago

The RuntimeError doesn't make sense.

>cat test.txt
>hello
ACTGACTGACT
>python scripts/jellyfish_kmers.py --infile test.fa
Traceback (most recent call last):
  File "scripts/jellyfish_kmers.py", line 40, in <module>
    main()
  File "scripts/jellyfish_kmers.py", line 26, in main
    mf = jellyfish.ReadMerFile(args.infile)
  File "swig/python/jellyfish.py", line 231, in __init__
RuntimeError: Unsupported format ''
gmarcais commented 8 years ago

The ReadMerFile class is meant to read the binary output of the Jellyfish program (the output of 'jellyfish count'). It does not parse fasta files.

MatthewRalston commented 8 years ago

Okay. I will close this issue. I didn't read that that was the requirement. Is there anyway to run the count function with the python bindings?

gmarcais commented 8 years ago

There is not a direct equivalent of the count function at this point, one that would take some input files and parse them in a multi-threaded fashion. It could be a valuable addition. Although, the count command has become quite complicated, with many options, support for bloom filter, etc.

There is a class HashCounter that allows using a Jellyfish hash internally. But, because it uses python to submit the the k-mers to the hash, it is single threaded and slower.