Open RenanFerreira0412 opened 2 months ago
Could you try with increased logging arrow --verbose -l debug annotations load_gff3
? that'll give us more information as to why it's failing
Now he's processing all the sequences from my GFF file, but when I refresh the Apollo page, the GFF file is not loaded into the annotation track.
The GFF file and the FASTA file with the sequence that I'm using can be found here: https://tritrypdb.org/tritrypdb/app/downloads/Current_Release/LdonovaniBPK282A1/
OBS: My GFF file has 36 sequences.
The output was too big, so this is just the ending part of it.
. . . DEBUG:root:unknown type protein_coding_gene INFO:root:Processing Ld36_v01s1 with features: [SeqFeature(SimpleLocation(ExactPosition(1019), ExactPosition(1163), strand=-1), type='protein_coding_gene', id='LdBPK_360010.1', qualifiers=...), SeqFeature(SimpleLocation(ExactPosition(3957), ExactPosition(4260), strand=-1), type='protein_coding_gene', id='LdBPK_360020.1', qualifiers=...), SeqFeature(SimpleLocation(ExactPosition(6202), ExactPosition(6661), strand=-1), type='protein_coding_gene', id='LdBPK_360030.1', qualifiers=...), ... . . . DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:unknown type protein_coding_gene DEBUG:root:writing out: [] DEBUG:root:empty list, no more features to write DEBUG:root:writing out: [] DEBUG:root:empty list, no more features to write INFO:root:Finished loading {}
what's your gff look like? I'm guessing it doesn't match our expected structure hence this result.
edit: ah you linked to it, ok, ill take a look when i can (apologies, not much spare time currently)
Looking at the gff it does follow roughly the expected model, with the change of protein_coding_gene
rather than just gene
.
Ld01_v01s1 VEuPathDB protein_coding_gene 3662 4663 . - . ID=LdBPK_010010.1;description=Protein of unknown function (DUF2946)%2C putative;ebi_biotype=protein_coding
Ld01_v01s1 VEuPathDB mRNA 3662 4663 . - . ID=LdBPK_010010.1.1;Parent=LdBPK_010010.1;description=Protein of unknown function (DUF2946)%2C putative;gene_ebi_biotype=protein_coding
Ld01_v01s1 VEuPathDB exon 3662 4663 . - . ID=exon_LdBPK_010010.1.1-E1;Parent=LdBPK_010010.1.1;gene_id=LdBPK_010010.1
Ld01_v01s1 VEuPathDB CDS 3662 4663 . - 0 ID=LdBPK_010010.1.1-p1-CDS1;Parent=LdBPK_010010.1.1;gene_id=LdBPK_010010.1;protein_source_id=LdBPK_010010.1.1-p1
it could be fixed either by changing protein_coding_gene
to gene
in your GFF file, or by updates to python-apollo.
https://github.com/GMOD/Apollo/blob/develop/client/apollo/js/SequenceOntologyUtils.js#L55 suggests that it's a valid feature as far as apollo is concerned, so likely we should expand to include some of these other terms (@abretaud what do you think), but until now we've been a bit cautious to only support structures we've seen before, lest this library cause any issues. It looks like ncRNA_gene
is also used, so, clearly multiple top level features we've never seen before.
You can patch this yourself quickly by editing apollo/util.py
to add your types to the gene_types
list which may be faster than waiting on a new release of this library
Yeah we could support other top level feature types, no time to change the code for now, but feel free to propose a PR (or just modify the input gff to the expected gene
type)
Oh, I see. I tried adding the types in the apollo/util.py file as you suggested, and it worked.
He loaded all the features in the annotation track, but some of them were loaded with an exclamation mark.
I'm not sure why this happened.
These are the modifications I made in the apollo/util.py file.
Thanks for the help.
Questions marks only represent non-canonical splice sites: it's just a visual warning for curators in case they want to check carefully the splice site position
Hi everyone!
I’m using the arrow command arrow annotations load_gff3 to load a full GFF3 into an annotation track, but nothing's happening.
The version of my plugin: apollo 4.2.13
Command: arrow annotations load_gff3 [OPTIONS] ORGANISM GFF3
My command: arrow annotations load_gff3 Leishmania /home/renanigor/Downloads/TriTrypDB-67_LdonovaniBPK282A1.gff
OBS: I’m using the docker to run the Apollo
My organism: arrow organisms show_organism Leishmania { "commonName": "Leishmania", "blatdb": "/data/temporary/apollo_data/34-Leishmania/seq/Leishmania.fa.2bit", "metadata": "{\"creator\":\"32\"}", "annotationCount": 2, "currentOrganism": true, "obsolete": false, "sequences": 36, "directory": "/data/temporary/apollo_data/34-Leishmania", "publicMode": false, "valid": true, "genomeFastaIndex": "seq/Leishmania.fa.fai", "genus": null, "species": "donovani", "id": 34, "nonDefaultTranslationTable": null, "genomeFasta": "seq/Leishmania.fa" }
I really don’t know what the actual problem is because there are no error log messages.
When I run the command, the output is just empty braces.
(apollo_env) renanigor@pop-os:~/VirtualEnvs$ arrow annotations load_gff3 Leishmania /home/renanigor/Downloads/TriTrypDB-67_LdonovaniBPK282A1.gff {}
Does anyone know how I can fix this?