Open arunramachandran15 opened 2 months ago
I have a doubt here.
By "n users" do you mean only the "n number of svelte application users accessing the sheets interface" ? or different users accessing from a python script as well, who can directly access the duckdb file for data analysis as well?
can you give me access to your readme and design docs
I have a doubt here.
By "n users" do you mean only the "n number of svelte application users accessing the sheets interface" ? or different users accessing from a python script as well, who can directly access the duckdb file for data analysis as well?
Key points: duckdb is a single process database
Possible architecture: Svelte Javascript application connects via web sockets
The Tornado library tunnels you to the right pod on that pod there is a web socket client manager that listens to the web sockets and drops messages into a queue (yes you can use an ValKey (just an opensource implementation of Redis)
Thanks Arun
Cleint Svelte (web socket) . ----> Server SvelteKit NodeJs (socketIO)
SERVER SIDE: Server SvelteKit NodeJs (socketIO) ------> ValKey (redis open source) Queue and cache for UI datatables
Python Duckdb Manager (listen to Queue) ----> single connection ---> Duckdb file for pod (Persistent volume on K8 node)
Duckdb --->. Python Duckdb Manager (know your changes)
Python Duckdb Manager --> updates ----> Redis Cache for tables with changes.
Redis Cache (via javascript) ---> send changes to client via a broadcast socket for users listening to the same duckdb file (socket room) SocketIO ----> Svelte App (websocket)
Note: many users have their own socket connection to submit changes all users listen to the same broadcast socket from the server that sends back changes to all users connected to the same duckdb file
i think this is your sequence design in text format
Please check the rough design diagram @vcanaran . I am skipping the redis part for simplicity and going with just the python solution instead of using a nodejs wrapper.
after reviewing the above design I feel, redis queue part is mandatory to ensure reads/writes are serialized. So I am updating the architecture to the below one.
correct you need a pub/sub to serialize the macros/actions/work into the singleton duckdb manager which is the only process that is servicing the duckdb database file
for now they are serial however if you implement a mutex you could serialize updates but interleave reads since reads "could" happen concurrently via the same read-write connection.
Initial problem stated by Arun :
Implement a google sheet like frontend interface using Swelete Send real time updates in the sheets to using a data loader to DuckDB Receive the real time updates using a web server and socket implementation Another user using the sheets should be able to see the updates in realtime and also anybody using the DuckDB for their processing with real time updates. To use docker and Kubernetes, implement DuckDB as statefulset in K8s (For db file persistent storage)
UI / APIs
List all sheets interface (Integrate with GET /sheets api) Sheet interface (Integrate with Get /sheets/sheet-id api) (To fetch the initial sheet data and reconciliation with real time updates) Websocket client to send and receive real time updates from server Mount the duckdb file as a persistent volume in k8s. Since DuckDB is OLAP and columnar storage, I am thinking of using mysql for transactional data storage. I need to do a bit further research on this.
Email comment by Vish
your problem statement is correct one point to add is you have "N" users connected to the same duckdb database file that resides in a k8 pod and any user can read and write to that duckdb file note duckdb is a single process memory mapped columnar database ==> means you need a singleton duckdb manager to do all reads and writes to the duckdb table
Clarification on APIs: you would have to use websockets which means you don't have the GET, PUT, POST, UPDATE, DELETE REST API verbs so all traffic across the web socket
TSQL:
if you did consider postgreSQL: which is multi-user --> but we are only using DuckDB for this experiment don't think about MySQL think postgresql; DuckDB is a postgreSQL compliant database postgresql has a duckdb connector ==> https://github.com/duckdb/pg_duckdb