Koenkk / zigbee-herdsman

A Node.js Zigbee library
MIT License
485 stars 301 forks source link

Make it possible to save in an external database #305

Open wilmardo opened 3 years ago

wilmardo commented 3 years ago

Same issue as https://github.com/Koenkk/zigbee2mqtt/issues/4947

As stated in the linked issue this would be a welcome addition for users who write on SD card like storage.

My usecase is a bit specific but nonetheless a bit of explanation. I am trying to get a stateless Zigbee2MQTT implementation so it can failover between Kubernetes nodes (external CC2530 TCP serial) without the need of a volume (since distributed storage is hard on low power devices). I am using Kubernetes configmaps for the configuration so only the database.db and the coordinator_backup.json are blocking a completely stateless approach. If these could be stored in an external database then the image can be run without a volume (stateless) which offloads the persistence to a NAS for example.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

wilmardo commented 3 years ago

Not stale.

With something like Sequelize it would be possible to maintain the current SQLite implementation as backwards compatible option.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

wilmardo commented 3 years ago

Still not stale though haha

timdonovanuk commented 3 years ago

+1 to this. But you mention a NAS - you could always mount the k8s container storage on a NFS shared drive. Or GlusterFS.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

Toastyyy3 commented 3 months ago

I suppose there is still no support for external databases?

judahrand commented 2 months ago

@Koenkk I think this is still not stale? I'd be interested in having a little dig into this if there is might be any interest in accepting changes.

Koenkk commented 2 months ago

@judahrand changes are welcome, but note that it's not just the database.db and coordinator_backup.json, all files in the data folder should be persisted (also state.json)

judahrand commented 2 months ago

Fantastic! I took an initial look at this last night and it (obviously) won't be a trivial change but I do think there could be significant value to using a 'real' database in terms of stability, extensibility (litestream, Postgres, etc.), and, I suspect, even performance.

I'm not a Typescript developer normally so I'm not familiar with the ORM landscape in Typescript but from my investigation last night I think that using Drizzle probably makes the most sense. It seems to aim to be as lightweight as possible seems to fit well for zigbee-herdsman. Initially, I suspect it would also be most straightforward to only support Sqlite. Other databases could be supported later if the desire was present.

There are a few design considerations to make too, I think. One that jumps to mind is how the 'backups' would work. For Coordinator backups this could probably just be versioned rows in a table and the newest row is always used by default or a given row could be marked as 'active'. A limited number of these backups could be kept at any given time and they could be timestamped.

When it comes to the backup logic when a reset is performed it becomes a little more difficult. For Sqlite initially the same logic can be used where we just copy the entire database file to a backed up version. However, this isn't really possible when using a remote database (and in this case one could argue that this sort of disaster recovery should be the database's responsibility not the application's). I wonder if in that case the functionality could be dropped?

I'm sure there are other decisions that would need to be made but from what I can tell, as long as the API for Controller is maintained and a migration path for the current files exists then other breaking changes should be acceptable. Is this correct?

Koenkk commented 2 months ago

I'm not convinced that a SQL database will have any impact on performance, the whole database is loaded in memory and all devices are resolved through a very efficient lookup. A typical database is also nothing in terms of size, mine is 102KB (+- 80 devices).

An external database will also not solve the problem of needing persistent storage, z2m still stores configuration.yaml and state.json which needs to be writable.

I think a better approach here is to allow the Z2M data directory to be stored on a remote filesystem. If we use for example OpenDAL you could store the directory on almost any remote file system, e.g. via sftp or even OneDrive! 😄

judahrand commented 2 months ago

That's a fair point on performance. I hadn't initially realized that the whole database is loaded into memory. There are other advantages to using SQLite at least, however, imo.

An external database will also not solve the problem of needing persistent storage, z2m still stores configuration.yaml and state.json which needs to be writable.

Does the configuration.yaml need to be writable and persistent? That seems a surprising property for a config file which I'd expect to be entirely user controlled. That state.json could easily be migrated to live in the same SQLite database I think.

I've got a branch which implements a SQLite database which almost passes all the tests so I'll get that working and put it up at least for consideration.

Koenkk commented 2 months ago

Does the configuration.yaml need to be writable and persistent?

Yes, e.g. the devices is updated whenever you pair a new device and for example when updating settings through the frontend.

Nerivec commented 2 months ago

I think orm is much too complex for the 100 or so lines that Z2M needs. There is practically no use for the features since, like Koenkk said, everything is cached in memory on startup and read/parsed from there at runtime, not from the database. The tiny size allows this without creating RAM issues, and makes it a hard-to-beat option as far as speed is concerned.

As for writes, a quick bench on current vs drizzle orm branch shows a 2x drop in performance minimum. No doubt the overhead alone is a killer :/

Coordinator backups in the database is definitely a no-go for me. This is not portable.

judahrand commented 2 months ago

As for writes, a quick bench on current vs drizzle orm branch shows a 2x drop in performance minimum. No doubt the overhead alone is a killer :/

What benchmark are you running? I suspect it can be improved substantially - the current implementation is the simplest lift and shift possible.

Coordinator backups in the database is definitely a no-go for me. This is not portable.

I'm not sure I'm understanding you here. Why is it not portable? Easy to get back to the same JSON by pulling it out of a db table.

Nerivec commented 2 months ago

What benchmark are you running? I suspect it can be improved substantially - the current implementation is the simplest lift and shift possible.

The usual, tinybench, with a save action (nothing else but a device.save(), I bypassed the rest of the logic). I don't think it can be improved by much, since every runtime action is in-memory, not in-database, only saves actually touch the database at runtime, no querying, no filtering, etc.. The only thing the orm would actually end up being used for, is load everything in-memory on startup, then periodically write to fs (trying to do anything else versus the current in-RAM implementation would result in a further massive drop in performance). Hence the orm overhead being a killer against a plain fs read/write.

Easy to get back to the same JSON by pulling it out of a db table.

Not without dependencies and with extra parsing required. Current file is natively supported and follows the open backup format. https://github.com/zigpy/open-coordinator-backup