cancervariants / metakb

Central repository for the VICC metakb web application
MIT License
14 stars 4 forks source link

feat!: refactor graph DB code, streamline unneeded query transaction usage #377

Open jsstevenson opened 3 days ago

jsstevenson commented 3 days ago

The main performance change here is to do away with querying code that employs transactions where it doesn't need to -- for reads, if subquery 1 (e.g. checking that a study ID is valid) runs successfully, but then something wrong happens in query 2 (e.g. fetching the nested data for that study), we don't need to do anything to the result of query 1 (it already happened). Everything in the query module is a read, so we can just use the now-standard driver.execute_query method for reads instead of creating a session context manager every time. This overhead is probably quite minimal but I think it also simplifies things.

Similarly, when preparing a read-only connection, we probably don't need to check for the existence of constraints, so an option is added to suppress that check.

The Graph class is also refactored here to separate responsibilities better between code that acquires/handles DB connections (which remains in the database module) and code that transforms knowledgebase data into write queries to store in the DB (moved to the load_data (???) module).

Finally, we weren't really making meaningful use of anything stateful in the Graph code, so it's refactored down to functions, rather than a class. The QueryHandler wasn't even using the graph (just the driver), so rather than a class that creates and holds onto a driver, the get_driver() function handles credentials processing and provides a Neo4j Driver instance to downstream users (the CLI, the QueryHandler).

close #373 (there's more work to do there but fine to cross it out for now especially with possible changes to the data model still incoming)