ConesaLab / SQANTI3

Tool for the Quality Control of Long-Read Defined Transcriptomes
GNU General Public License v3.0
197 stars 48 forks source link

[BUG] After running filter, the second column of the filtered gtf file is always `pacbio`, even if I'm using ONT data. #314

Closed dudududu12138 closed 2 months ago

dudududu12138 commented 2 months ago

Is there an existing issue for this?

Have you loaded the SQANTI3.env conda environment?

Problem description

After running sqanti3_qc.py and sqanti3_filter.py, I got a filtered.gtf, but the second column of the gtf file is always PacBio. My input data is ONT data. Below is my output:

GL000194.1      PacBio  transcript      53589   115018  .       -       .       transcript_id "TCONS_00000129"; gene_id "XLOC_000075";
GL000194.1      PacBio  exon    53589   55676   .       -       .       transcript_id "TCONS_00000129"; gene_id "XLOC_000075";
GL000194.1      PacBio  exon    112792  112850  .       -       .       transcript_id "TCONS_00000129"; gene_id "XLOC_000075";
GL000194.1      PacBio  exon    114986  115018  .       -       .       transcript_id "TCONS_00000129"; gene_id "XLOC_000075";
GL000219.1      PacBio  transcript      77559   99699   .       -       .       transcript_id "TCONS_00000829"; gene_id "XLOC_000183";
GL000219.1      PacBio  exon    77559   78952   .       -       .       transcript_id "TCONS_00000829"; gene_id "XLOC_000183";
GL000219.1      PacBio  exon    79937   80028   .       -       .       transcript_id "TCONS_00000829"; gene_id "XLOC_000183";
GL000219.1      PacBio  exon    83213   83317   .       -       .       transcript_id "TCONS_00000829"; gene_id "XLOC_000183";

Code sample

module load miniconda3
source activate SQANTI3.env

sample=$1
echo -e "$sample"

ref_anno=~/reference/merge/merge.combined.gtf
ref=~/reference/GRCh38.p14.genome.fa
polyA=~/software/SQANTI3-5.2.1/data/polyA_motifs/mouse_and_human.polyA_motif.txt
cage=~/software/SQANTI3-5.2.1/data/ref_TSS_annotation/human.refTSS_v3.1.hg38.bed

input=$sample.gtf

sqanti3_qc.py $input  $ref_anno  $ref \
        -t 2 -d ./ -o $sample \
        --report skip --force_id_ignore \
        --CAGE_peak $cage --polyA_motif_list $polyA

sqanti3_filter.py rules ${sample}_classification.txt \
 -j filtering.json \
 --isoforms ${sample}_corrected.fasta    \
 --gtf ${sample}_corrected.gtf \
 --faa ${sample}_corrected.faa \
 --skip_report \
 -d $output -o $sample
aarzalluz commented 2 months ago

Hi @dudududu12138 -this is done internally by default, but it does not have any impact in downstream analysis.

Ángeles

dudududu12138 commented 2 months ago

Hi @dudududu12138 -this is done internally by default, but it does not have any impact in downstream analysis.

Ángeles

Thanks,I checked your scripts and found the reason. It is indeed as you say.