cgroza / GraffiTE

GraffiTE is a pipeline that finds polymorphic transposable elements in genome assemblies and/or long reads, and genotypes the discovered polymorphisms in read sets using genome-graphs.
Other
106 stars 4 forks source link

short reads question #14

Closed Spieler2999 closed 11 months ago

Spieler2999 commented 1 year ago

Hi,

I started a new issue as I am not sure on how the short reads should be handled and others may also be interested in the answer.

1) It says that the reads should be "Paired-end reads must be interleaved in the same file (Pangenie)". Can you advice on how to interleave the files? Is a simple zcat sufficient? zcat sampleR1.fq.gz sampleR12.fq.gz

2) How much coverage should the short reads have to be suitable? I have 20X, is this too much?

3) I am comparing assemblyA to assemblyB. Your manual say to add short reads to aid with genotyping. Should the short reads come from the same individual as assemblyB was based on in order to avoid any bias?

Many thanks!

clemgoub commented 1 year ago

Hello!

  1. Indeed, zcat will work perfectly. I don't know why I wrote "interleaved" because they actually don't have too, as long as all the reads are in the same file. Can you confirm @cgroza ?

  2. Our benchmark shows that the more the better (we tested 5, 10, 20 and 30X). The preprint should be out very soon!

  3. I can be either way, depending on your experiment. You can use the short-reads from the same, say assembly B, to confirm the variant found and get a bi-allelic genotype, and/or you can also use a read-set from another sample to genotype it against the collection of variants found between the assemblies.

I hope this helps, let us know if you need more precisions!

Cheers,

Clément