Closed bbista closed 5 years ago
Dear @bbista , thank you for reporting this bug. I just inspected the GFF you mentioned (which I presume is from this folder: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/241/765/GCF_000241765.3_Chrysemys_picta_bellii-3.0.3/ ) and the problem stems from the fact that gene-LOC112059410 is a pseudogene without any transcript feature associated to it. That's kinda invalid under the GFF ontology, and Mikado was explicitly written not to accommodate such a case.
Looking more in detail at the GFF file, there honestly seem to be a lot of similar problems, such as coding genes without mRNAs, or tRNAs without a gene parent. All of these break the gene ontology and Mikado's model of how a GFF should look like.
I also tried using the GTF, cleaning it up first with GffRead, but to no avail. The only solutions are
Dear @bbista, I have started fixing the problems you found.
With the latest commit, mikado util stats
is now able to parse the file appropriately. I will now work on making mikado compare
compatible as well.
Changes will be reflected in Mikado2 (and live in Mikado 2.0rc6).
Kind regards
Current status: mikado
now supports this problematic GFF in util stats
, util convert
, compare
, prepare
.
The only utility left before the issue can be closed is mikado util grep
. Once that is fixed, this issue can be closed.
I was trying to look at the stats for a gff3 file I downloaded off NCBI. I get this error message. mikado util stats GCF_000241765.genomic.gff genomic.stats /home/bbista/.local/lib/python3.6/site-packages/Mikado/configuration/configurator.py:529: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. scoring = yaml.load(scoring_file) 2019-10-02 19:04:43,336 - main - init.py:124 - ERROR - main - MainProcess - Mikado crashed, cause: 2019-10-02 19:04:43,336 - main - init.py:125 - ERROR - main - MainProcess - gene-LOC112059410 {} Traceback (most recent call last): File "/home/bbista/.local/lib/python3.6/site-packages/Mikado/init.py", line 110, in main args.func(args) File "/home/bbista/.local/lib/python3.6/site-packages/Mikado/subprograms/util/stats.py", line 711, in launch calculator() File "/home/bbista/.local/lib/python3.6/site-packages/Mikado/subprograms/util/stats.py", line 335, in call self.parse_input() File "/home/bbista/.local/lib/python3.6/site-packages/Mikado/subprograms/util/stats.py", line 324, in parse_input current_gene.add_exon(record) File "/home/bbista/.local/lib/python3.6/site-packages/Mikado/loci/reference_gene.py", line 165, in add_exon raise AssertionError("{}\n{}".format(parent, self.transcripts, row)) AssertionError: gene-LOC112059410 {} Do you have any idea what is going wrong?
Best, Basanta