Open sanyalab opened 1 month ago
Dear @sanyalab
IsoQuant does support both GTF and GFF, but not BED. Could you send me the entire isoquant.log file?
Also, you can try running IsoQuant with --no_gtf_check
.
Best Andrey
Hi Andrey,
I actually went ahead and converted the GFF3 to a geneDB format using gffutils. This would be a preprocessing step. It seems to be running fine now. The isoquant.log file is 152MB in size and I cannot upload the same. But here are the first 10 lines and the last 10 FIRST:
Command line: isoquant.py --reference genome.fa --genedb Annotation.gff3 --fastq Sample1.flnc.fastq Sample2.flnc.fastq Sample3.flnc.fastq Sample4.flnc.fastq --output FL_ALL --prefix OUT --data_type pacbio_ccs --fl_data --threads 24 --check_canonical --sqanti_output --matching_strategy precise --splice_correction_strategy default_pacbio --model_construction_strategy fl_pacbio
2024-09-19 11:34:28,180 - INFO - Running IsoQuant version 3.5.0
2024-09-19 11:34:28,222 - INFO - === IsoQuant pipeline started ===
2024-09-19 11:34:28,222 - INFO - gffutils version: 0.13
2024-09-19 11:34:28,223 - INFO - pysam version: 0.22.1
2024-09-19 11:34:28,223 - INFO - pyfaidx version: 0.8.1.1
2024-09-19 11:34:28,228 - INFO - Checking input gene annotation
2024-09-19 11:34:29,316 - WARNING - Malformed GTF line 2 (gene_id attribute value cannot be found)
2024-09-19 11:34:29,316 - WARNING - Chr00 GSAP gene 151 2235 . + . ID=dummy1;Name=dummy1
2024-09-19 11:34:29,316 - WARNING - Malformed GTF line 3 (gene_id attribute value cannot be found)
2024-09-19 11:34:29,317 - WARNING - Chr00 GSAP mRNA 151 2235 . + ID=dummy1.1;Parent=dummy1;Name=dummy1.1
2024-09-19 11:34:29,317 - WARNING - Malformed GTF line 4 (gene_id attribute value cannot be found)
2024-09-19 11:34:29,317 - WARNING - Chr00 GSAP exon 151 2235 . + . ID=dummy1.1.exon1;Parent=dummy1.1
2024-09-19 11:34:29,317 - WARNING - Malformed GTF line 5 (gene_id attribute value cannot be found)
2024-09-19 11:34:29,317 - WARNING - Chr00 GSAP CDS 151 2235 . + 0 ID=dummy1.1.cds1;Parent=dummy1.1
2024-09-19 11:34:29,317 - WARNING - Malformed GTF line 6 (gene_id attribute value cannot be found)
2024-09-19 11:34:29,317 - WARNING - Chr00 GSAP gene 2412 4316 . + . ID=dummy2;Name=dummy2
2024-09-19 11:34:29,317 - WARNING - Malformed GTF line 7 (gene_id attribute value cannot be found)
2024-09-19 11:34:29,317 - WARNING - Chr00 GSAP mRNA 2412 4316 . + . ID=dummy2.1;Parent=dummy2;Name=dummy2.1
LAST:
2024-09-19 11:35:13,258 - WARNING - Malformed GTF line 638230 (gene_id attribute value cannot be found)
2024-09-19 11:35:13,258 - WARNING - Chr26 GSAP exon 1450283 1450513 . + . ID=dummy6432.1.exon1;Parent=dummy6432.1
2024-09-19 11:35:13,258 - WARNING - Malformed GTF line 638231 (gene_id attribute value cannot be found)
2024-09-19 11:35:13,258 - WARNING - Chr26 GSAP CDS 1450283 1450513 . + 0 ID=dummy6432.1.cds1;Parent=dummy6432.1
2024-09-19 11:35:13,258 - WARNING - Malformed GTF line 638232 (gene_id attribute value cannot be found)
2024-09-19 11:35:13,258 - WARNING - Chr26 GSAP gene 1465536 1465607 . - . ID=dummy6433;Name=dummy6433
2024-09-19 11:35:13,258 - WARNING - Malformed GTF line 638233 (gene_id attribute value cannot be found)
2024-09-19 11:35:13,258 - WARNING - Chr26 GSAP mRNA 1465536 1465607 . - . ID=dummy6433.1;Parent=dummy6433;Name=dummy6433.1
2024-09-19 11:35:13,258 - WARNING - Malformed GTF line 638234 (gene_id attribute value cannot be found)
2024-09-19 11:35:13,258 - WARNING - Chr26 GSAP exon 1465536 1465607 . - . ID=dummy6433.1.exon1;Parent=dummy6433.1
2024-09-19 11:35:13,297 - ERROR - Input GTF seems to be corrupted (see warnings above).
2024-09-19 11:35:13,297 - ERROR - An attempt to correct this GTF was made, the result is written to /Path/FL_ALL/Annotation.corrected.gff3
2024-09-19 11:35:13,297 - ERROR - NB! some transcript / gene ids in the corrected annotation are modified.
2024-09-19 11:35:13,297 - ERROR - Provide a correct GTF by fixing the original input GTF or checking the corrected one.
Its not recognizing the GFF3 file
@sanyalab
Thanks a lot! I will add GFF3 support to the internal checker.
So if gffutils converted it, you can run IsoQuant with --no_gtf_check
as well.
GFF3 should work in IsoQuant 3.6.1 without warnings.
Hi,
The tool says that it can work with GFF3. But it only works with GTF. Can we get GFF3 support?
Error I get when I provide GFF3 formatted file with the
--genedb
optionDo you consume the gene annotations in GTF format or Bed12 format? Is it ok to provide a bed12 file directly?
Thanks Abhijit