failed to get the expected result using SRR11000304.fastq

MrBleem commented 1 year ago

I've tried to use TELR with the data you mentioned in 'Local assembly of long reads enables phylogenomics of transposable elements in a polyploid cell line', but failed to get the expected result. And here are the data and code I used:

ref: dm6.fa download from UCSC https://hgdownload.soe.ucsc.edu/goldenPath/dm6/bigZips/
TE consensus sequence: D_mel_transposon_sequence_set_v10.2.fa, download from github https://github.com/bergmanlab/drosophila-transposons/
data: SRR11000304,download from SRA https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SAMN13970288&o=acc_s%3Aa
code: git clone git@github.com:bergmanlab/TELR.git cd TELR mamba env create -f envs/telr.yml conda activate TELR pip install . telr -i ./SRR11000304.fastq -r ./dm6.fa -l ./D_mel_transposon_sequence_set_v10.2.fa --aligner minimap2
question: Just 1 insertion in one sample wc -l *bed 1 SRR11000304.telr.bed

And I also tried SRR11000308,SRR11000316,SRR11000320,SRR11000326, get the same result as SRR11000304. I'm not sure what's wrong, can you give some advice? Thanks a lot in advance for helping me!

shunhuahan commented 1 year ago

Hi @MrBleem,

Thanks for the question.

There are 32 SRR runs in our project, in which 31 runs are for the same library prepped with S2R+ DNA. Each run correspond to one SMRT cells on a PacBio RS II instrument (see MATERIALS AND METHODS section in the paper for details). To replicate, you need to download data from all 31 SRR runs for S2R+ and merge them into a single fasta/fastq file before running TELR.

For the paper, we used the following workflow. Feel free to use other workflows but be sure to use all PacBio data for S2R+ for the TELR run.

Get bax h5 url for each SRR run using srapath and download all bax h5 files.
Convert from bax to bam using bax2bam.
Convert from bam to fasta using bam2fasta.
Merge all fasta files into a single fasta file.

For the paper, we used NGMLR as aligner, wtdbg2 as local contig assembler, and flye as polisher. Using minimap2 as aligner (as you showed in your example) might give faster run time but lower accuracy, which is totally fine but please keep that in mind if you want to replicate our results. Thanks!

Let me know if you have more questions.

Best, Shunhua

MrBleem commented 1 year ago

Ok, I get the expected result now. Thank you very much !

bergmanlab / TELR

failed to get the expected result using SRR11000304.fastq #28