bergmanlab / TELR

TELR is a fast non-reference transposable element detector from long read sequencing data.
https://github.com/bergmanlab/TELR
BSD 2-Clause "Simplified" License
32 stars 11 forks source link

failed to get the expected result using SRR11000304.fastq #28

Closed MrBleem closed 1 year ago

MrBleem commented 1 year ago

I've tried to use TELR with the data you mentioned in 'Local assembly of long reads enables phylogenomics of transposable elements in a polyploid cell line', but failed to get the expected result. And here are the data and code I used:

And I also tried SRR11000308,SRR11000316,SRR11000320,SRR11000326, get the same result as SRR11000304. I'm not sure what's wrong, can you give some advice? Thanks a lot in advance for helping me!

shunhuahan commented 1 year ago

Hi @MrBleem,

Thanks for the question.

There are 32 SRR runs in our project, in which 31 runs are for the same library prepped with S2R+ DNA. Each run correspond to one SMRT cells on a PacBio RS II instrument (see MATERIALS AND METHODS section in the paper for details). To replicate, you need to download data from all 31 SRR runs for S2R+ and merge them into a single fasta/fastq file before running TELR.

For the paper, we used the following workflow. Feel free to use other workflows but be sure to use all PacBio data for S2R+ for the TELR run.

  1. Get bax h5 url for each SRR run using srapath and download all bax h5 files.
  2. Convert from bax to bam using bax2bam.
  3. Convert from bam to fasta using bam2fasta.
  4. Merge all fasta files into a single fasta file.

For the paper, we used NGMLR as aligner, wtdbg2 as local contig assembler, and flye as polisher. Using minimap2 as aligner (as you showed in your example) might give faster run time but lower accuracy, which is totally fine but please keep that in mind if you want to replicate our results. Thanks!

Let me know if you have more questions.

Best, Shunhua

MrBleem commented 1 year ago

Ok, I get the expected result now. Thank you very much !