GATB / dsk

k-mer counting software
https://gatb.inria.fr/software/dsk/
GNU Affero General Public License v3.0
38 stars 9 forks source link

solidity-custom #7

Open ctseto opened 5 years ago

ctseto commented 5 years ago

From a previous github issue in re Simka:

"If you still need to recover kmer sequences, for instance recovering subsets of kmers according to their presence/absence in some datasets, you may have a look at the software DSK : http://github.com/GATB/dsk and its option -solidity-custom."

I presume it is dsk -file contigs1.fa,contigs2.fa -solidity-kind custom -solidity-custom , though I'm not quite sure what the -solidity-custom option is specifying for. Is it a file with a list of kmers ("specifies list of files where kmer must be present"). I generally get "Kmer solidity custom has different number of values (11) than banks (2)"; where banks appears to correspond to the number of files appended to flag -file.

rchikhi commented 5 years ago

Hi Charles,

The way I understand the -solidity-custom option from the source code, is:

1) it's not really well documented :) 2) the argument should be a string of 0 or 1 integers, where a 0 indicates that the kmer cannot be above the -abundance-min threshold for that dataset, and 1 indicates that on the contrary, it needs to be above. 3) the string should be in the same order as the files in the -file option.

I'm basing my observations on : https://github.com/GATB/gatb-core/blob/862ffc949bd3ff556442dc83b5af3666f58195d4/gatb-core/src/gatb/kmer/impl/CountProcessorSolidity.hpp#L295

E.g. if you have three input files, and -file file1.fa,file2.fa,file3.fa -abundance-min 3,5,3 -solidity-kind custom -solidity-custom 101, then a k-mer is output if it is seen 3 times or more in files 1 and 3, and seen less than 5 times in file2.

That's a bit odd, but that's how I understand the source code..

Probably @rizkg could confirm.

Rayan

rizkg commented 5 years ago

Hi,

Yes Rayan, you are correct about the usage. Sorry that it is not better documented. The idea is that it allows to output kmers that are specific to something.