Open joeflack4 opened 2 years ago
Thanks @joeflack4! It looks like the problem here has nothing to do with get_labels
, the problem is further upstream when connecting to a sqlite database.
@cmungall For 2 of these sub-issues, I agree with you that it is a DB connection issue.
I just added this to the bottom of my issue. This is how I'm initializing my usage of OAK:
from oaklib.resource import OntologyResource
from oaklib.implementations.sqldb.sql_implementation import SqlImplementation
resource = OntologyResource(slug='myOwlOntology.owl', local=True)
si = SqlImplementation(resource)
I just noticed that the front page of the OAK GitHub seems to indicate that I should be passing a sqlite .db
file into OAK. I hope this isn't the case. Do I need to pre-create a .db
file from OWL first? I hope not. I was under the impression that OAK would do this for me.
If a .db
file is required, can we have SqlImplementation
raise an error if something other than a .db
is passed into the local
param? I can make a quick PR for that if you like.
Thanks, this helps.
We will work on improving the documentation for initialization of SqlImplementation. Yes, the canonical way to do this is to pass in a path to a ready-made sqlite database. See SqlImplementation.
However, you are correct in that if you pass in an OWL file, it will build the sqlite database for you. There are a few caveats here.
I'm sorry, I missed the comment about iterators/generators before. This is indeed independent of the sqlite issues you are facing. The use of iterators is an intentional choice. We have a very preliminary FAQ entry about this, we will expand it.
The reason to use iterators is because OAK is designed for ontologies large and small, for endpoints local and remote, for big queries and small queries. Using an iterator allows the client to start doing useful work on initial results before all results have been computed.
There is indeed a tradeoff here, in that it requires slightly more defensive programming, and it's easy to make mistakes like len(oi.some_method(...))
. If you really need to operate on the whole list, just wrap in list(...)
. But for most operations you can just operate on the iterator as if it were a list, e.g in for loops
I made a separate issue for this #221
Ok, sorry to take up your times w/ the iterators concern. I struck-through that part of the issue.
Hmmm... no I haven't created a SQL DB in my Mondo work / from RDF or OWL yet, and I mostly have access to files in OWL, not RDF/XML. This will require a bit more work on my end. I'll converse with Nico and see how we want to use OAK in light of this.
Just adding again that I think SqlImplementation
should raise an error then if it is initialized from a file with extension other than .db
or .rdf
(or some other means of determining this without looking at the file extension).
Thanks for your very detailed explanations.
Just adding again that I think SqlImplementation should raise an error then if it is initialized from a file with extension other than .db or .rdf (or some other means of determining this without looking at the file extension).
I partially agree - I think the principle here is that it should raise an error if the input is something that it is unable to handle, and error reporting should be more intuitive. I made a separate issue for this: #223
However, I don't think it's good to unwaveringly assume a relationship between suffix and format/model
.rdf
could mean turtle, it could mean rdf/xml.owl
is also a reasonable suffix for any RDF syntax file that is intended to be interpreted as OWLSo while it's good to guess-with-consent based on suffix, it must be forgiving of going against convention
@cmungall I have updated this issue with another problem case. To make best use of your time, please look at new suggestions at the bottom of the original post.
Summary
I have a list of terms and I'm trying to get their labels using, but I got errors in some cases, and no label in others.
1-3: Using
SqlImplementation
1. No
get_label_by_uri()
orget_labels_by_uris()
I appreciate the CURIE methods, and will likely use those more often; I'd imagine these URI ones are somewhere on the to-do list.
2.
get_label_by_curie()
errors outPython code:
Short err:
sqlalchemy.exc.DatabaseError: (sqlite3.DatabaseError) file is not a database
Long err:3.
get_labels_for_curies()
errors out & prints out a lot tostderr
Code:
labels = [x for x in si.get_labels_for_curies(list(df_i['term_id']))]
Error:
It showed a lot more than this; more than my IDE's terminal could show. I couldn't see the actual stacktrace. Also looked like it printed the full list of terms in the ontology.
~### 4. UX:
get_labels_for_curies()
returns a generator~ ~I think the documentation shows specifically how to use it, but I wonder if this is the best design. I may just not be sure about the various benefits, but I'd definitely prefer if this defaulted to a (a) flat list, or (b) dict of CURIE -> label.~Edit: This is a design choice.
Additional information
How I'm initializing:
Relevant dependency versions:
All dependency versions: mondo-ingest-pip-freeze.txt
4. Using
ProntoImplementation
got no labels() when passing CURIEExample class
Code snippet
Suggestions
prefix_map
when initializing ontology. This may solve cases where it can't find on CURIE, because it can't expand CURIE properly.ProntoImplementation
def type definitions, e.g.label(self, curie: CURIE) -> str:
anddef _entity(self, curie: CURIE, strict=False):
toUnion[CURIE, URI]
. In my test case, URI worked successfully.