gmarcais / Jellyfish

A fast multi-threaded k-mer counter
Other
463 stars 136 forks source link

query command functionality #13

Closed kes1smmn closed 10 years ago

kes1smmn commented 10 years ago

Dear Guillaume,

Would you consider the following functionality in the query command? I would like to get the kmers present in each sequence of a multiline fasta and fastq.

Would it be possible to modify the current jellyfish output to have each sequence separated by the read_id or an added first column with the read id.

OR

even a more simply display read_id count_of_kmers_present_in_jellyfish_db

thanks for humoring me. I have a python wrapper that does this by querying each sequence individual but the speed is not as good as I would like. Keith

gmarcais commented 10 years ago

Hi Keith,

I am not sure I want to add more options to the query command.

On the other hand, the functionality you are asking for might be of interest to others as well. So I created a new examples directory in the tree to hold such extra program to manipulate the JF output.

Please pull the latest develop branch (it is not in master yet). There is a new directory called examples/query_per_sequence. Provided that JF is properly installed, 'make' in there will create a program that does what you are asking for. If the output is not what you wish, you can modify this program to your liking (it should be relatively straight forward).

Guillaume.

kes1smmn commented 10 years ago

Guillaume,

This is awesome and will do nicely. I do get an error when calling 'make' would you mind helping me troubleshoot. First I do believe I have jellyfish built and running correctly.
pkg-config --print-provides jellyfish-2.0 [ Returns jellyfish-2.0 = 2.1.3 ]

screen shot 2014-04-09 at 12 12 58 pm

I tried to build the other application in the examples to and also get an error.

screen shot 2014-04-09 at 12 17 07 pm

Thanks again Keith

Pasting the text from the errors below in case the screenshots are to small


[query_per_sequence] $ make g++ -I/home/ksimmons/bin/Jellyfish-develop/include/jellyfish-2.1.3 -std=c++0x -Wall -Werror -O3 -L/home/ksimmons/bin/Jellyfish-develop/lib -ljellyfish-2.0 -lpthread query_per_sequence.cc sequence_mers.hpp -o query_per_sequence cc1plus: warnings being treated as errors query_per_sequence.cc: In function ‘void query_from_sequence(PathIterator, PathIterator, const Database&, bool) [with PathIterator = char, Database = jellyfish::mer_dna_bloom_counter]’: query_per_sequence.cc:73: instantiated from here query_per_sequence.cc:43: error: comparison between signed and unsigned integer expressions query_per_sequence.cc: In function ‘void query_from_sequence(PathIterator, PathIterator, const Database&, bool) [with PathIterator = char, Database = binary_query]’: query_per_sequence.cc:78: instantiated from here query_per_sequence.cc:43: error: comparison between signed and unsigned integer expressions make: *\ [query_per_sequence] Error 1

[count_in_file]$ make g++ -I/home/ksimmons/bin/Jellyfish-develop/include/jellyfish-2.1.3 -Wall -Werror -O3 -std=c++0x -L/home/ksimmons/bin/Jellyfish-develop/lib -ljellyfish-2.0 -lpthread count_in_file.cc -o count_in_file /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/unique_ptr.h: In copy constructor ‘common_info::common_info(const common_info&)’: /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/unique_ptr.h:214: error: deleted function ‘std::unique_ptr<_Tp, _Tp_Deleter>::unique_ptr(const std::unique_ptr<_Tp, _Tp_Deleter>&) [with _Tp = jellyfish::RectangularBinaryMatrix, _Tp_Deleter = std::default_deletejellyfish::RectangularBinaryMatrix]’ count_in_file.cc:53: error: used here count_in_file.cc: In function ‘common_info readheaders(int, char*, jellyfish::cpp_array&)’: count_in_file.cc:97: note: synthesized method ‘common_info::common_info(const commoninfo&)’ first required here make: ** [count_in_file] Error 1

wwood commented 10 years ago

I'm interested too, and get the same error (for query_per_sequence - I'm getting a linking error for count_in_file).

gmarcais commented 10 years ago

I can't wait for CentOS/RedHat to move off of gcc version 4.4! It gives me so much trouble all the time.

Anyway, it should now be fixed. Please pull the latest and try to compile again.

kes1smmn commented 10 years ago

Thanks for fixing it. It works so wonderfully. I tested both and they worked on my system. Both will be useful to me. I am grateful.

I did get an error on my initial run. "error while loading shared libraries: libjellyfish-2.0.so.2: cannot open shared object file: No such file or directory"

To solve this I updated my - LD_LIBRARY_PATH="~/bin/Jellyfish-develop/lib"

Jellyfish was installed and available with pkg-config which i did with - PKG_CONFIG_PATH="~/bin/Jellyfish-develop/" prior to the library error.

It might be useful to add the above to the README in the examples folder. I was not sure if the library path was supposed to be handled when jellyfish is included in the 'pkg-config', so I wanted to mention it.

Thanks again. Keith

gmarcais commented 10 years ago

Thanks for all your bug reports and improvement suggestions. I modified the Makefile to have the linker store the location of where the library is installed. This way you do not need to set your LD_LIBRARY_PATH. The exec "just work".

Guillaume.

On Thu, Apr 10, 2014 at 11:37 PM, kes1smmn notifications@github.com wrote:

Closed #13 https://github.com/gmarcais/Jellyfish/issues/13.

Reply to this email directly or view it on GitHubhttps://github.com/gmarcais/Jellyfish/issues/13 .

sylvain-ri commented 4 years ago

Hi @gmarcais Does Jellyfish support kmer counts per read or kmer count for an input (let's say echo ATCGACGTA | jellyfish [...] ?

I saw your comment here stating that Jellyfish won't provide much speed for small files/sequences ? https://github.com/gmarcais/Jellyfish/issues/23#issuecomment-59966372

gmarcais commented 4 years ago

Please, create a new issue for a new unrelated question. See issue #160.