dnbaker / dashing

Fast and accurate genomic distances using HyperLogLog
GNU General Public License v3.0
161 stars 11 forks source link

Usage for paired end reads? #91

Open Valentin-Bio-zz opened 2 years ago

Valentin-Bio-zz commented 2 years ago

Hello I have paired end reads from metagenomic libraries, ahead to estimate distances between these read sets should I concatenate the fastq files and then apply dist command?

this is what i have been thinking

cat fwd.fastq rev.fastq > sample1.fastq
dashing dist -k31 -O distance_matrix.txt -T -o size_estimates.txt sample{1..30}.fastq 

Thanks for your time :)

dnbaker commented 2 years ago

Hi Valentin,

You have two options. First, you can concatenate and sketch them, or you can use the -F flag, which lets you generate one sketch per line of the input file. Simply place multiple files on the same line to place them into the same sketch.

e.g.,

rs1-1.fq rs1-2.fq
rs2-2.fq rs2-2.fq

which would make a sketch for rs1 and rs2.

It looks to me like your distance command should work. Let me know if you have any more questions!

Thanks,

Daniel