CMB-BNU / PloidyFrost

reference-free estimation of ploidy level from whole genome sequencing data based on de Bruijn graph
15 stars 2 forks source link

Using PloidyFrost with Paired end reads #7

Open grantn5 opened 1 month ago

grantn5 commented 1 month ago

Hi Thanks for developing a great tool!

I would just like some clarification on the input to the tool, specifically the sample.fq in

kmc -ci1 -cs10000 -k25 -t${thread} sample.fq kmc_db kmc_tmp

Should this be a raw fastq or an aligned fastq? Additionally, if it is raw how do you pass paired end reads to the tool?

e.g sample1.fq and sample2.fq

Thanks in advance for your help.

Yunxiao-web commented 1 month ago

I think this be a raw fastq and it may be correct to perform the following steps for each sample. For instance,

kmc -ci1 -cs10000 -k25 -t${thread} sample1.fq sample1.kmc_db kmc_tmp
kmc -ci1 -cs10000 -k25 -t${thread} sample2.fq sample2.kmc_db kmc_tmp

kmc_tools -t${thread} filter -hm sample1.kmc_db sample1.fq -ci${lower_threshold} sample1._filtered.fq 
kmc_tools -t${thread} filter -hm sample2.kmc_db sample2.fq -ci${lower_threshold} sample2._filtered.fq 

then construct compacted DBGS for multiple samples: Bifrost build -c -i -d -k 25 -v -r sample1_filtered.fq -r sample2_filtered.fq -r ... -o cdbg -t ${thread}

grantn5 commented 1 month ago

Hi thanks for the clarification on the raw vs aligned.

However paired-end sequencing is 2 files from the same sample, one on the forward strand and the other on the reverse strand. How should these be processed with kmc and subsequently PloidyFrost?

Sorry for the confusion in my earlier example, files from paired-end reads will be presented as follows sample1_1.fq, sample1_2.fq where both files are from the same sample. One file is the forward strand, and the other is the reverse strand.