Open prakashnarayanan98 opened 7 months ago
Hi @prakashnarayanan98,
Thanks for reporting the error. A few things:
telr -i read.fastq -l library.fasta -r reference.fasta --assembler flye --polisher flye
and see if switching to flye for assembly and polishing could help.Thanks, Shunhua
FASTQ Data:
Library: chakraborty_simulans_TE
>TcMar-Mariner:MARINA
gataagtccccggtctgacacatagatggcgtcgctagtatta
Reference Genome:
>Scf_2L type=golden_path_region; loc=Scf_2L:1..23539531; ID=Scf_2L; dbxref=GB:CM002910; MD5=4db334c02c86dfa856dc1a48c595acf1; length=23539531; release=r2.02; species=Dsim;
TTTGTGCAGTTAGAGTGGGCGTGGCAACATGTGTCAATAAACCTACGCTGCGTCTATGTCTCAAAATCTGTACGCTGAAT
Thanks @prakashnarayanan98 for providing this info.
We haven't yet tested TELR extensively on simulans data, so there is no guarantee that the entire workflow will be issue-free for this species. Did you get any successful assemblies for most insertion candidates? If the assembly failure is only for a small subset of all non-reference TE insertions, you can potentially look into rescuing those assemblies ad-hoc. Below are files you can use for this purpose.
If you provide --keep_files
when running TELR, all intermediate files will be kept under <output_dir>/intermediate_files
.
/intermediate_files/reads.vcf
/intermediate_files/sv_reads
. Assembly results for all candidate loci are available at /intermediate_files/contig_assembly
. They can be used to diagnose assembly errors and test your own assembly strategies.
Description:
Click to expand for Sample of processing error
``` Successfully created the directory /TELR/intermediate_files/vcf_ins_repeatmask RepeatMasker version open-4.0.7 Search Engine: NCBI/RMBLAST [ 2.6.0+ ] Rebuilding RepeatMaskerLib.embl library - Read in 216 sequences from /miniconda3/envs/TELR/share/RepeatMasker/Libraries/DfamConsensus.embl RepeatMaskerLib.embl: 216 total sequences. Master RepeatMasker Database: /miniconda3/envs/TELR/share/RepeatMasker/Libraries/RepeatMaskerLib.embl ( Complete Database: dc20170127 ) Custom Repeat Library: /TELR/intermediate_files/LIBRARY.fasta Warning...unknown stuff < > Building general libraries in: /miniconda3/envs/TELR/share/RepeatMasker/Libraries/dc20170127/general analyzing file /TELR/intermediate_files/Read.vcf_ins.fasta identifying matches to LIBRARY.fasta sequences in batch 1 of 11 identifying matches to LIBRARY.fasta sequences in batch 2 of 11 identifying matches to LIBRARY.fasta sequences in batch 3 of 11 identifying matches to LIBRARY.fasta sequences in batch 4 of 11 identifying matches to LIBRARY.fasta sequences in batch 5 of 11 identifying matches to LIBRARY.fasta sequences in batch 6 of 11 identifying matches to LIBRARY.fasta sequences in batch 7 of 11 identifying matches to LIBRARY.fasta sequences in batch 8 of 11 identifying matches to LIBRARY.fasta sequences in batch 9 of 11 identifying matches to LIBRARY.fasta sequences in batch 10 of 11 identifying matches to LIBRARY.fasta sequences in batch 11 of 11 processing output: cycle 1 . cycle 2 . Generating output... . masking done Successfully created the directory /TELR/intermediate_files/sv_reads Successfully created the directory /TELR/intermediate_files/contig_assembly assembly failed assembly failed assembly failed assembly failed assembly failed assembly failed assembly failed assembly failed assembly failed assembly failed assembly failed assembly failed assembly failed assembly failed assembly failed assembly failed assembly failed assembly failed assembly failed assembly failed assembly failed assembly failed assembly failed assembly failed assembly failed Successfully created the directory /TELR/intermediate_files/vcf_seq2contig Use repeatmasker to annotate contig TE families instead of minimap2 Successfully created the directory /TELR/intermediate_files/contig_te_repeatmask RepeatMasker version open-4.0.7 Search Engine: NCBI/RMBLAST [ 2.6.0+ ] Master RepeatMasker Database: /miniconda3/envs/TELR/share/RepeatMasker/Libraries/RepeatMaskerLib.embl ( Complete Database: dc20170127 ) Custom Repeat Library: /TELR/intermediate_files/LIBRARY.fasta Warning...unknown stuff < > analyzing file /TELR/intermediate_files/Read.fa identifying matches to LIBRARY.fasta sequences in batch 1 of 10 identifying matches to LIBRARY.fasta sequences in batch 2 of 10 identifying matches to LIBRARY.fasta sequences in batch 3 of 10 identifying matches to LIBRARY.fasta sequences in batch 4 of 10 identifying matches to LIBRARY.fasta sequences in batch 5 of 10 identifying matches to LIBRARY.fasta sequences in batch 6 of 10 identifying matches to LIBRARY.fasta sequences in batch 7 of 10 identifying matches to LIBRARY.fasta sequences in batch 8 of 10 identifying matches to LIBRARY.fasta sequences in batch 9 of 10 identifying matches to LIBRARY.fasta sequences in batch 10 of 10 processing output: cycle 1 . cycle 2 . Generating output... . masking done Done Successfully created the directory /TELR/intermediate_files/telr_reads Scf_2L_22107544_22107544 no assembly Scf_2L_22734202_22734202 no assembly Scf_2R_380878_380881 no assembly Scf_2R_2670123_2670123 no assembly Scf_3L_23424019_23424021 no assembly Scf_NODE_103476_626_627 no assembly Scf_NODE_105063_6969_6970 no assembly Scf_NODE_11571_24517_24519 no assembly Scf_NODE_12809_1023_1023 no assembly Scf_NODE_18214_489_489 no assembly Scf_NODE_24465_1162_1163 no assembly Scf_NODE_26715_949_952 no assembly Scf_NODE_3168_468_468 no assembly Scf_NODE_36936_1052_1057 no assembly Scf_NODE_37551_815_815 no assembly Scf_NODE_39506_601_603 no assembly Scf_NODE_46678_5042_5042 no assembly Scf_NODE_5267_896_897 no assembly Scf_NODE_59901_2861_2861 no assembly Scf_NODE_60709_627_628 no assembly Scf_NODE_68951_87_88 no assembly Scf_NODE_69473_1091_1091 no assembly Scf_NODE_72290_1975_1976 no assembly Scf_NODE_76112_1320_1320 no assembly Scf_NODE_98642_1306_1307 no assembly Successfully created the directory /TELR/intermediate_files/ref_repeatmask RepeatMasker version open-4.0.7 Search Engine: NCBI/RMBLAST [ 2.6.0+ ] Master RepeatMasker Database: /miniconda3/envs/TELR/share/RepeatMasker/Libraries/RepeatMaskerLib.embl ( Complete Database: dc20170127 ) Custom Repeat Library: /TELR/intermediate_files/LIBRARY.fasta Warning...unknown stuff < > ```Environment:
telr -i read.fastq -l library.fasta -r reference.fasta
Issue: The contig assembly process in TELR is encountering multiple failures, leading to the generation of empty assemblies for several sequences.
Observed Behavior:
Logs:
Additional Information:
Notes:
This issue is hindering the progress of the project. Any assistance or guidance in resolving this matter would be greatly appreciated.