Closed francicco closed 6 days ago
@francicco
Seems like an issue with pyfaidx, could you update it to 0.7 or any higher version?
Best Andrey
Hi Andrey,
It looks like I was able to fix the bug. Now it looks like my gft/gff are not correctly identified do you have a gtf file that I can use as template to see how mine should be formatted?
I tried different ways to convert my gff3 file but none worked.
Cheers F
@francicco
The GTF processing has not changed since the previous version. I have only added some preliminary checks that can be turned off with --no_gtf_check
. The GTF you linked is acceptable, yes.
Do you get any error messages or warnings? Do you have a log for the failed run?
Best Andrey
Hi @andrewprzh,
I converted the gff3 with a script of mine and it works. The actual job in the queue. it will take some time I guess. Just to double-check, what I'm doing is giving ONT reads mapped with minimap2 as BAM file to isoquant together with a reference annotation that I'd like to fix and update.
Is that fine? Or do I need to collapse my reads somehow?
Thanks a lot F
@francicco
Everything seems right, no pre-processing/collapsing is needed, IsoQuant does everything on its own. Moreover, you can give IsoQuant FASTQ files and it will run minimap2 with optimal parameters automatically.
Let me know if you encounter any issues during the run.
Best Andrey
Ok, one job started. I'm getting this info:
2024-05-15 02:59:58,224 - INFO - primary: 14240592
2024-05-15 02:59:58,224 - INFO - secondary: 875089
2024-05-15 02:59:58,225 - INFO - supplementary: 663575
2024-05-15 02:59:58,225 - INFO - unaligned: 5245182
2024-05-15 02:59:58,225 - INFO - Resolving multimappers
2024-05-15 03:00:09,269 - INFO - Finishing read assignment, total assignments 14077454, polyA percentage 61.0
2024-05-15 03:00:17,692 - INFO - Read assignments files saved to IsoQuant2.4.Hmel.PCGs/HmelIsoSeq/aux/HmelIsoSeq.save*.
2024-05-15 03:00:17,692 - INFO - To keep these intermediate files for debug purposes use --keep_tmp flag
2024-05-15 03:00:17,710 - INFO - Total alignments used for analysis: 14077454, polyA tail detected in 8593063 (61.0%)
2024-05-15 03:00:17,711 - INFO - Processing assigned reads HmelIsoSeq
2024-05-15 03:00:17,711 - INFO - Transcript models construction is turned on
2024-05-15 03:00:17,715 - INFO - Transcript construction options:
2024-05-15 03:00:17,716 - INFO - Novel monoexonic transcripts will be reported: no
2024-05-15 03:00:17,716 - INFO - PolyA tails are required for multi-exon transcripts to be reported: no
2024-05-15 03:00:17,716 - INFO - PolyA tails are required for 2-exon transcripts to be reported: no
2024-05-15 03:00:17,716 - INFO - PolyA tails are required for known monoexon transcripts to be reported: no
2024-05-15 03:00:17,716 - INFO - PolyA tails are required for novel monoexon transcripts to be reported: yes
2024-05-15 03:00:17,716 - INFO - Splice site reporting level: only_stranded
Are these good stats? lots of NOs, should I turn some flags on? Cheers F
Hi @andrewprzh,
One of the two alignments just finished. I wanted to run some stats on HmelIsoSeq.extended_annotation.gtf
, which I think is the final annotation, with new, old, and fixed genes and transcripts. Unfortunately, I'm getting a lot of errors giving it to multiple tools for conversion/parsing, including SQANTI which returns errors. I think it has some malformed problems.
I'm going to write a script trying to fix it. I'll let you know. Cheers F
Dear @francicco
Are these good stats? lots of NOs, should I turn some flags on?
IsoQuant decided to set these flags due to polyA tail percentage. If you'd like your IsoQuant to report novel transcript models only with confirmed polyA sites you can set --polya_requirement always
.
Could you send me the errors you get from these tools and the GTF file itself, I'm eager to fix the problem if there is one. I regularly check IsoQuant output with gffcompare and have not detected any problems.
Best Andrey
Hi @andrewprzh,
So, if I try with gffcompare for example I'm getting this:
gffcompare -r $MyRefgff3 extended_annotation.gtf
29275 reference transcripts loaded.
2 duplicate reference transcripts discarded.
Error: no valid ID found for GFF record
I noticed that one of the problem is a double "
nex to transcript_id
, such as in a line like this one:
Hmel200004o annotation transcript 1 10141 . + . gene_id "transcript:Hmel.Hmel200004oG1"; transcript_id ""transcript:Hmel.Hmel200004oG1.3"; Canonical "True"; exons "10"; score "5.64";
Hmel200004o annotation start_codon 1 3 . + . gene_id "transcript:Hmel.Hmel200004oG1"; transcript_id ""transcript:Hmel.Hmel200004oG1.3"; exon "1"; exon_id "Hmel200004o.110";
Hmel200004o annotation CDS 1 58 . + . gene_id "transcript:Hmel.Hmel200004oG1"; transcript_id ""transcript:Hmel.Hmel200004oG1.3"; exon "2"; exon_id "Hmel200004o.89";
Hmel200004o annotation exon 1 58 . + . gene_id "transcript:Hmel.Hmel200004oG1"; transcript_id ""transcript:Hmel.Hmel200004oG1.3"; exon "3"; exon_id "Hmel200004o.89";
Hmel200004o annotation CDS 1729 1818 . + . gene_id "transcript:Hmel.Hmel200004oG1"; transcript_id ""transcript:Hmel.Hmel200004oG1.3"; exon "4"; exon_id "Hmel200004o.123";
Hmel200004o annotation exon 1729 1818 . + . gene_id "transcript:Hmel.Hmel200004oG1"; transcript_id ""transcript:Hmel.Hmel200004oG1.3"; exon "5"; exon_id "Hmel200004o.123";
Hmel200004o annotation CDS 2919 3063 . + . gene_id "transcript:Hmel.Hmel200004oG1"; transcript_id ""transcript:Hmel.Hmel200004oG1.3"; exon "6"; exon_id "Hmel200004o.103";
Hmel200004o annotation exon 2919 3063 . + . gene_id "transcript:Hmel.Hmel200004oG1"; transcript_id ""transcript:Hmel.Hmel200004oG1.3"; exon "7"; exon_id "Hmel200004o.103";
Hmel200004o annotation CDS 3349 3440 . + . gene_id "transcript:Hmel.Hmel200004oG1"; transcript_id ""transcript:Hmel.Hmel200004oG1.3"; exon "8"; exon_id "Hmel200004o.93";
Hmel200004o annotation exon 3349 3440 . + . gene_id "transcript:Hmel.Hmel200004oG1"; transcript_id ""transcript:Hmel.Hmel200004oG1.3"; exon "9"; exon_id "Hmel200004o.93";
Hmel200004o annotation CDS 3811 3964 . + . gene_id "transcript:Hmel.Hmel200004oG1"; transcript_id ""transcript:Hmel.Hmel200004oG1.3"; exon "10"; exon_id "Hmel200004o.94";
Hmel200004o annotation exon 3811 3964 . + . gene_id "transcript:Hmel.Hmel200004oG1"; transcript_id ""transcript:Hmel.Hmel200004oG1.3"; exon "11"; exon_id "Hmel200004o.94";
Hmel200004o annotation CDS 4330 4525 . + . gene_id "transcript:Hmel.Hmel200004oG1"; transcript_id ""transcript:Hmel.Hmel200004oG1.3"; exon "12"; exon_id "Hmel200004o.95";
Hmel200004o annotation exon 4330 4525 . + . gene_id "transcript:Hmel.Hmel200004oG1"; transcript_id ""transcript:Hmel.Hmel200004oG1.3"; exon "13"; exon_id "Hmel200004o.95";
Hmel200004o annotation CDS 4803 4895 . + . gene_id "transcript:Hmel.Hmel200004oG1"; transcript_id ""transcript:Hmel.Hmel200004oG1.3"; exon "14"; exon_id "Hmel200004o.96";
Hmel200004o annotation exon 4803 4895 . + . gene_id "transcript:Hmel.Hmel200004oG1"; transcript_id ""transcript:Hmel.Hmel200004oG1.3"; exon "15"; exon_id "Hmel200004o.96";
Hmel200004o annotation CDS 5422 5550 . + . gene_id "transcript:Hmel.Hmel200004oG1"; transcript_id ""transcript:Hmel.Hmel200004oG1.3"; exon "16"; exon_id "Hmel200004o.97";
Hmel200004o annotation exon 5422 5550 . + . gene_id "transcript:Hmel.Hmel200004oG1"; transcript_id ""transcript:Hmel.Hmel200004oG1.3"; exon "17"; exon_id "Hmel200004o.97";
Hmel200004o annotation CDS 6560 6749 . + . gene_id "transcript:Hmel.Hmel200004oG1"; transcript_id ""transcript:Hmel.Hmel200004oG1.3"; exon "18"; exon_id "Hmel200004o.98";
Hmel200004o annotation exon 6560 6749 . + . gene_id "transcript:Hmel.Hmel200004oG1"; transcript_id ""transcript:Hmel.Hmel200004oG1.3"; exon "19"; exon_id "Hmel200004o.98";
Hmel200004o annotation stop_codon 6747 6749 . + . gene_id "transcript:Hmel.Hmel200004oG1"; transcript_id ""transcript:Hmel.Hmel200004oG1.3"; exon "20"; exon_id "Hmel200004o.111";
Hmel200004o annotation UTR 8334 10141 . + . gene_id "transcript:Hmel.Hmel200004oG1"; transcript_id ""transcript:Hmel.Hmel200004oG1.3"; exon "21"; exon_id "Hmel200004o.112";
Once I fix it with a simple sed
command now gffcompare
runs. But I suspect there might be more!
I'll have to investigate further.
Cheers F
Hi,
I think I found what bothers Mikado on top of the double "
. It seems like it is the exons "X"
tag.
Cheers
F
Thanks a lot for letting me know and sorry for the late reply.
Will close this issue for now.
Hi,
I was testing the new release of IsoQuant and I found this error:
Cheers F