Open IceFreez3r opened 2 months ago
Copy and paste the exact command you tried to run
flair collapse --query results/flair/correct/lung_all_corrected.bed --genome resources/reference.fa --reads /project/hfa_work/ENCODE/data/reads/ENCFF552NVU.fastq.gz /project/hfa_work/ENCODE/data/reads/ENCFF934MBW.fastq.gz /project/hfa_work/ENCODE/data/reads/ENCFF341BSQ.fastq.gz /project/hfa_work/ENCODE/data/reads/ENCFF250IWT.fastq.gz --gtf resources/annotation.gtf --threads {threads} --output results/flair/collapse/lung
where the reference.fa and annotation.gtf are both from GENCODE release v46. Data files are publically available on ENCODE (ENCODE cart).
How did you install Flair? bioconda with Snakemake 8.16, environment has just FLAIR:
name: flair channels: - bioconda - conda-forge dependencies: - flair
What happened? Output gtf file threw an error when I tried to index it with tabix after sorting and compressing it. Turns out the gtf has exons with length 0 and -1.
chr17 FLAIR transcript 82442644 82449291 . - . gene_id "ENSG00000178927.19"; transcript_id "m54284U_200415_060704/157550641/ccs"; chr17 FLAIR exon 82442644 82444124 . - . gene_id "ENSG00000178927.19"; transcript_id "m54284U_200415_060704/157550641/ccs"; exon_number "0"; chr17 FLAIR exon 82444447 82444591 . - . gene_id "ENSG00000178927.19"; transcript_id "m54284U_200415_060704/157550641/ccs"; exon_number "1"; chr17 FLAIR exon 82445864 82445863 . - . gene_id "ENSG00000178927.19"; transcript_id "m54284U_200415_060704/157550641/ccs"; exon_number "2"; chr17 FLAIR exon 82449293 82449291 . - . gene_id "ENSG00000178927.19"; transcript_id "m54284U_200415_060704/157550641/ccs"; exon_number "3";
Start of the last two exons are larger than their ends.
The follow up commands, that revealed the error (snakemake syntax, but it should be intuitive to understand): Sorting and compression
(grep -v "^#" {input} | sort -k1,1 -k4,4n | bgzip -c > {output}) > {log} 2>&1
Tabix
tabix -p gff {input} > {log} 2>&1
What else do we need to know? I ran the same analysis on reads from five other tissues and had no issues there.
After fixing the exons to be at least length 1 I found 0-length exons in 4 of the other tools.
Copy and paste the exact command you tried to run
where the reference.fa and annotation.gtf are both from GENCODE release v46. Data files are publically available on ENCODE (ENCODE cart).
How did you install Flair? bioconda with Snakemake 8.16, environment has just FLAIR:
What happened? Output gtf file threw an error when I tried to index it with tabix after sorting and compressing it. Turns out the gtf has exons with length 0 and -1.
Start of the last two exons are larger than their ends.
The follow up commands, that revealed the error (snakemake syntax, but it should be intuitive to understand): Sorting and compression
Tabix
What else do we need to know? I ran the same analysis on reads from five other tissues and had no issues there.