gmarcais / Jellyfish

A fast multi-threaded k-mer counter
Other
460 stars 136 forks source link

Piping sequence into jellyfish #160

Open gmarcais opened 4 years ago

gmarcais commented 4 years ago

Hi @gmarcais Does Jellyfish support kmer counts per read or kmer count for an input (let's say echo ATCGACGTA | jellyfish [...] ?

I saw your comment here stating that Jellyfish won't provide much speed for small files/sequences ?

23 (comment)

Originally from issue #13

gmarcais commented 4 years ago

The input must be either in fasta or fastq format. You could define a shell function like the following (put it in your .bashrc for example):

function jfc() {                   
jellyfish count -C -o /dev/stdout -s 1k -m $1 --text /dev/stdin | { tail -c +$(head -c 9) }
}

This function runs jellyfish on the standard input and write the result on the standard output. The funny head and tail thing at the end simply removes the header of the output file.

Then

echo -e ">\nACGTGTACATACGT" | jfc 4
ACAC 1
ACAT 1
ACGT 2
ATAC 1
CACG 1
CATA 1
CGTA 1
GTAC 1
TACA 2

Hope this helps.

Regarding the comment on speed. Jellyfish will work just fine. But it is true that on a very small input, as shown above, Jellyfish is a big hammer for a small task. That it is though, nothing wrong otherwise.