Open devraj opened 5 months ago
From my initial reading, the way we should approach this is to write a customer adapter for shillelagh
which also allows for a custom sqlalchemy dialect.
The references the following
Sharks have been around for a long time. They’re older than trees and the rings of Saturn, actually! The reason they haven’t changed that much in hundreds of millions of years is because they’re really good at what they do.
SQL has been around for some 50 years for the same reason: it’s really good at what it does.
as their philosophy for Why SQL
.
This is awesome! Thanks for the heads up, and let me know if you have any questions, I'm happy to help.
Quick note for anyone else working with shillelagh
and attempting to debug their adapter
, my general strategy for development is:
virtualenv
shillelagh
to use the interactive consolepip
in editable mode from the local folder e.g pip install -e ~/Work/Lib/gallagher
I figured it would be easier to this before I get into building a SQLAlchemy dialect #9
At first attempt it all looked well and when I ran:
🍀> SELECT * FROM "https://commandcentre-api-au.security.gallagher.cloud/api/cardholders";
Unsupported table: https://commandcentre-api-au.security.gallagher.cloud/api/cardholders
I got the unsupported table
error and could not figure out what was wrong. For obvious reasons the console suppresses log messages, I decided to enable logging and set it to DEBUG
, this was added to the top of the file containing my adapter.
import logging
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s %(levelname)s %(message)s')
upon restarting the console, I saw the following, and found that my adapter wasn't registering properly:
🍀>
GoodBye!
(shillelagh) ➜ ~ shillelagh
2024-05-15 07:51:36,627 ERROR test
2024-05-15 07:51:36,628 DEBUG No module named 'gallagher.ext.shillelagh.GallagherCommandCentreAPI'
2024-05-15 07:51:36,641 DEBUG Using selector: KqueueSelector
Upon inspecting my pyproject.toml
I found that I had a syntax error where I was referring to the module as gallagher.ext.shillelagh.GallagherCommandCentreAPI
instead of gallagher.ext.shillelagh:GallagherCommandCentreAPI
, the fixed version is:
[tool.poetry.plugins."shillelagh.adapter"]
gacc = "gallagher.ext.shillelagh:GallagherCommandCentreAPI"
And given that it returns the required rows of data, the logs should make it easier to debug the adapter.
TypeError: missing a required argument: 'api_key'
(shillelagh) ➜ ~ shillelagh
2024-05-15 07:55:03,649 ERROR test
2024-05-15 07:55:03,666 DEBUG Using selector: KqueueSelector
🍀> SELECT * FROM "https://commandcentre-api-au.security.gallagher.cloud/api/cardholders";
2024-05-15 07:55:09,149 DEBUG Instantiating adapter with deserialized arguments: ['https://commandcentre-api-au.security.gallagher.cloud/api/cardholders', 'api_key']
authorised first_name id last_name
------------ ------------ ---- -----------
True Dev 1 Mukherjee
(1 row in 0.05s)
2024-05-15 07:55:09,150 DEBUG Using selector: KqueueSelector
Make sure that this isn't enabled when the adapter ships, even better find a way to dynamically configure the logger
utils.py
in the dto
package currently uses the py3.9
__annotation__
property to access the annotations:
@classmethod
def _accumulated_annotations(cls) -> dict:
"""Return a dictionary of all annotations
This method is used to return a dictionary of all the
annotations from itself and it's parent classes.
It is intended for use by the shillelagh extension
"""
annotations = cls.__annotations__.copy() # TODO: should we make copies?
for base in cls.__bases__:
if issubclass(base, BaseModel):
# Copy annotations from classes that are pydantic models
# they are the only things that form part of the response
annotations.update(base.__annotations__)
return annotations.items()
This recommendation has changes in py3.10
see this article.
Looking at the inspect
package get_annotations
returns the annotations of the class itself, where as typing.get_type_hints
returns typing definitions from itself and parent classes.
>>> import typing
>>> typing.get_type_honest(CardholderSummary)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'typing' has no attribute 'get_type_honest'. Did you mean: 'get_type_hints'?
>>> typing.get_type_hints(CardholderSummary)
{'href': <class 'pydantic_core._pydantic_core.Url'>, 'id': <class 'str'>, '_good_known_since': typing.Optional[datetime.datetime], 'first_name': <class 'str'>, 'last_name': <class 'str'>, 'short_name': typing.Optional[str], 'description': typing.Optional[str], 'authorised': <class 'bool'>}
We should refactor this method to use a py3.10+
solution.
As our ambition around the library grows, the sync feature #9 looks to be a lot more useful than I previously imagined. One of my thoughts was to treat this as if they were two databases sources and use a layer like SQLAlchemy to keep them in sync.
Sync is a difficult problem (see also SQL Sync for Schema with SQLAlchemy python-sync-db) at the best of times. My initial research lead me down the path of writing a SQLAlchemy dialect primarily thinking about if it's possible to wrap a REST service as a SQL source.
I found a number of articles and discussions which leads to believe this is a possible way forward:
Digging through the list of external dialects I found betodealmeida / gsheets-db-api which provides that we can converse with a REST API via a SQLAlchemy dialect. This project links to Shillelagh which is a library is an implementation of the Python DB API 2.0 based on SQLite (using the APSW library)
Other resources:
The requirement is thus to research the above resources and outline the possibility of using a SQLALchemy dialect to write the sync module with the view of being able to use the REST client to communicate with the Gallagher proxy.
Why SQL?
One of the major questions around this is
Why SQL?
, we could head down the route of using an object database. The question to consider is if the end user / customer would benefit from the data being available in an object database?With a foundation of writing to SQL backend we also stand the advantage of writing to / reading from corporate database backends.