TGAC / kontaminant

Tool to look for contaminants, with a kmer database.
15 stars 5 forks source link

which output files to take? #2

Open biocyberman opened 7 years ago

biocyberman commented 7 years ago

I ran this command and got many .FASTQ output files which has suffixes KMER{0..294}.FASTQ. I am not sure which output to take for downstream analysis. Can you clarify?

kmer_filter_160 --reference hg38_k21.kmers --read_1 IonTorrentSample.fq --output_prefix IonXpress_005 --mem_height 24

biocyberman commented 7 years ago

Read more carefully the readme, here is what I understand now: File IonTorrentSample_R1_KMERS0.FASTQ contains reads that do not have any REFERENCE kmers in there. IonTorrentSample_R1_KMERS1.FASTQ contains reads that have 1 REFERENCE kmer in each read. And so on.

So it is up to me to choose a threshold of what level of contaminant I want to accept?

richardmleggett commented 7 years ago

Hi, Sorry, this old version of kontaminant is not really supported any more. There's a new version here: https://github.com/richardmleggett/kontaminant which I maintain.

biocyberman commented 7 years ago

Ooops! Can you or someone write a big notice on this repo about the discontinuing of support? And some user guide on the new repo will be nice.

homonecloco commented 7 years ago

Hi, As you figured out, it is opt to you to decide the threshold. When we wrote this originally we figured out that you get a distribution of number of k-mers from the contamination. The code as it is works, but richard has been refactoring it. Best, Ricardo.