bergmanlab / TELR

TELR is a fast non-reference transposable element detector from long read sequencing data.
https://github.com/bergmanlab/TELR
BSD 2-Clause "Simplified" License
32 stars 11 forks source link

Improve liftover #9

Closed shunhuahan closed 3 years ago

shunhuahan commented 3 years ago
  1. TELR_liftover.py module has been upgraded with following changes:
    1. The script now serves as a module in the TELR main program to liftover TE annotations from contigs to reference genome for identifying non-reference TE predictions and also a standalone script in the evaluation framework to liftover TE annotations from one genome to another.
    2. The old module only generates a BED file including coordinates of all non-reference liftover results. The new script generates a JSON file and reports detailed information for all annotations and categorizes them into three types: non-reference, reference and unlifted. The information for each liftover annotation provided by the new script includes: coordinate before and after liftover workflow, family info, TSD info, QC metrics for flanking sequence alignment (gap, alignment coordinate, mapping quality, residue matches, alignment block length, sequence identity).
    3. Users can now provide --different_contig_name option. If this option is provided then TELR does not require the contig name to match before and after annotation liftover (default: require contig name to be the same before and after liftover)
  2. TELR now reports VCF file, BED file, JSON file, expanded JSON file with additional QC metrics for advanced users, TE FASTA file and contig FASTA file. The README has been updated to reflect latest changes in the TELR output format
  3. Users can optionally use minimap2 to annotate contig TE families by providing --minimap2_family option. By default, repeatmasker will be used both for identifying TE boundary and annotating TE family in the contigs.
  4. Update TELR prediction coordinate and sequence evaluation utility program.