INCATools / semantic-sql

SQL and SQLite builds of OWL ontologies
https://incatools.github.io/semantic-sql/
BSD 3-Clause "New" or "Revised" License
37 stars 3 forks source link

Improve speed of converting SQLite to FHIR #64

Open joeflack4 opened 1 year ago

joeflack4 commented 1 year ago

Overview

I tried to convert HPO to FHIR using semsql as an intermediary. However, after about 40 minutes, I decided to give up and switch to Obographs for speed. I think it took about 10 minutes to convert to a .db, and the rest of the time in my process was just OAK trying to load the DB. Normally semsql is much faster to load than using rdflib, but not in this case. I looked and saw that my hpo.db was about 1GB, which is about 10x larger than my hpo.owl. I looked at some of my other conversions, and it looks like this 5-10x file size was normal.

If I'm correct that the issue is not so much OAK performance, but just the file size in general, is there anything we can do to reduce these file sizes? Or maybe it's not so much the size, but the structure that is taking OAK a long time to parse downstream? If this is more of an OAK issue (or both an OAK issue and a semsql issue), I can open up a ticket over there.

Potential causes

May be 1 or more of the following that's taking a lot of time. a. Semsql: File size b. Semsql: Non-optimal structures for downstream parsing c. OAK: Not parsing optimally d. OAK: Spending time doing things that are maybe not needed for my use case

cmungall commented 1 year ago

I don't think it's anything to do with file sizes. it's likely it is iterating and performing multiple SQL queries. this shoulld be easy to optimize

joeflack4 commented 1 year ago

That sounds hopeful! Thanks Chris.