BrooksLabUCSC / flair

Full-Length Alternative Isoform analysis of RNA
Other
205 stars 71 forks source link

Set `--annotation_reliant` but still got novel single-exon genes #346

Open dudududu12138 opened 1 month ago

dudududu12138 commented 1 month ago

Hi, I recently ran flair collapse with you recommend parameters on human ont data. Below are my codes:

flair collapse -t 10 \
        -q $input/${sample}_all_corrected.bed \
        -r $sample.fastq \
        -f $anno -g $ref \
        --stringent --check_splice --generate_map --annotation_reliant generate \
        -o $output/$sample/$sample

You said if set --annotation_reliant, then Flair was restricted to only genes present in the input gtf(https://flair.readthedocs.io/en/latest/faqs.html). While I still found single exon novel genes in my result. And there are even the same genes on different strand(e.g: gene_id "chr1:629000"). Below is some examples of my result:

chr1    FLAIR   transcript      629332  629433  .       -       .       gene_id "chr1:629000"; transcript_id "DRR481115.3087112-0";
chr1    FLAIR   exon    629332  629433  .       -       .       gene_id "chr1:629000"; transcript_id "DRR481115.3087112-0"; exon_number "0";
chr1    FLAIR   transcript      629640  630151  .       +       .       gene_id "chr1:629000"; transcript_id "DRR481115.970061-0";
chr1    FLAIR   exon    629640  630151  .       +       .       gene_id "chr1:629000"; transcript_id "DRR481115.970061-0"; exon_number "0";
chr1    FLAIR   transcript      630336  630514  .       +       .       gene_id "chr1:630000"; transcript_id "DRR481115.2224333-0";
chr1    FLAIR   exon    630336  630514  .       +       .       gene_id "chr1:630000"; transcript_id "DRR481115.2224333-0"; exon_number "0";
chr1    FLAIR   transcript      631740  632260  .       +       .       gene_id "chr1:631000"; transcript_id "DRR481115.807993-0";
chr1    FLAIR   exon    631740  632260  .       +       .       gene_id "chr1:631000"; transcript_id "DRR481115.807993-0"; exon_number "0";
chr1    FLAIR   transcript      665210  667532  .       +       .       gene_id "chr1:665000"; transcript_id "DRR481115.3569437-0";
chr1    FLAIR   exon    665210  667532  .       +       .       gene_id "chr1:665000"; transcript_id "DRR481115.3569437-0"; exon_number "0";
chr1    FLAIR   transcript      707220  710405  .       +       .       gene_id "chr1:707000"; transcript_id "DRR481115.4446146-0";
chr1    FLAIR   exon    707220  710405  .       +       .       gene_id "chr1:707000"; transcript_id "DRR481115.4446146-0"; exon_number "0";

So do you know the reason? By the way, I used the latest version of flair.