emanuelpalm commented 3 years ago

During a recent discussion I've had with @jerkerdelsing about databases, it seems as if the idea is that all Arrowhead systems should use embedded databases, or at least start up and manage their own database processes. Ideally, the user of an Arrowhead system should have to do no database configuration or setup at all.

While leaving it open for database clustering is standard practice in the context of the Internet, @jerkerdelsing claims that it will rarely be directly relevant in the context of Arrowhead. In situation where would be relevant, my impression is that it should be facilitated by the way in which the Arrowhead systems themselves are connected, and not how their databases are.

Perhaps it would be apt to switch from MySQL/MariaDB to SQLite? If you don't mind using non-relational databases, I have good experience with LMDB, which is a key/value store relying on B+-trees. If you @ng201 @tsvetlin have doubts or questions about this, please have a discussion with @jerkerdelsing and everyone else with stake in MySQL/MariaDB/PostgreSQL being supported.

jerkerdelsing commented 3 years ago

LMDB is licensed under https://www.openldap.org/software/release/license.html license.

Which is not on the list of of approached licenses.

Have not been able to ask if it’s OK since the @.**@.> ,ailing list is broken.

Jerker

Professor Jerker Delsing Lulea University of Technology EISLAB 97187 Lulea Sweden email: @.**@.> phone: 0046706261931 http://www.ltu.se/eislab

On 16 Mar 2021, at 16:27, Emanuel Palm @.**@.>> wrote:

During a recent discussion I've had with @jerkerdelsinghttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjerkerdelsing&data=04%7C01%7C%7C6d7a2a1f4f47460fdba008d8e89006e3%7C5453408ba6cd4c1e8b1018b500fb544e%7C1%7C0%7C637515052576169117%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WEoSUGexDkZFftw%2BIO1vrvi%2FSRLS%2Bn9rdPvMO%2BtomAw%3D&reserved=0 about databases, it seems as if the idea is that all Arrowhead systems should use embedded databases, or at least start up and manage their own database processes. Ideally, the user of an Arrowhead system should have to do no database configuration or setup at all.

While leaving it open for database clustering is standard practice in the context of the Internet, @jerkerdelsinghttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjerkerdelsing&data=04%7C01%7C%7C6d7a2a1f4f47460fdba008d8e89006e3%7C5453408ba6cd4c1e8b1018b500fb544e%7C1%7C0%7C637515052576169117%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WEoSUGexDkZFftw%2BIO1vrvi%2FSRLS%2Bn9rdPvMO%2BtomAw%3D&reserved=0 claims that it will rarely be directly relevant in the context of Arrowhead. In situation where would be relevant, my impression is that it should be facilitated by the way in which the Arrowhead systems themselves are connected, and not how their databases are.

Perhaps it would be apt to switch from MySQL/MariaDB to SQLitehttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsqlite.org%2Findex.html&data=04%7C01%7C%7C6d7a2a1f4f47460fdba008d8e89006e3%7C5453408ba6cd4c1e8b1018b500fb544e%7C1%7C0%7C637515052576179114%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=argFzs8lbftgfvb8Wym8xrA%2BJ1nG2JgO%2Bs1uWedOFFM%3D&reserved=0? If you don't mind using non-relational databases, I have good experience with LMDBhttps://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.lmdb.tech%2Fdoc%2F&data=04%7C01%7C%7C6d7a2a1f4f47460fdba008d8e89006e3%7C5453408ba6cd4c1e8b1018b500fb544e%7C1%7C0%7C637515052576179114%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=hOOYcg1adVkF4W6VhBwDWVIwRNTm9YJCl2q%2BqBl2xi8%3D&reserved=0, which is a key/value store relying on B+-treeshttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FB%252B_tree&data=04%7C01%7C%7C6d7a2a1f4f47460fdba008d8e89006e3%7C5453408ba6cd4c1e8b1018b500fb544e%7C1%7C0%7C637515052576189112%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=yaY94SF2RVxqVWkJDCZHjJsosHIddRV%2FZ3ElEI9OeAw%3D&reserved=0. If you @ng201https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fng201&data=04%7C01%7C%7C6d7a2a1f4f47460fdba008d8e89006e3%7C5453408ba6cd4c1e8b1018b500fb544e%7C1%7C0%7C637515052576189112%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=XKdDH96f%2F7W0lWXUaRptn43dffE7ix35GPwcEKQ4K80%3D&reserved=0 @tsvetlinhttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftsvetlin&data=04%7C01%7C%7C6d7a2a1f4f47460fdba008d8e89006e3%7C5453408ba6cd4c1e8b1018b500fb544e%7C1%7C0%7C637515052576199105%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=CByc1%2B43Kv4iMACbnZUrgfyvmwUwX7RVMIDMw%2BlPjto%3D&reserved=0 have doubts or questions about this, please have a discussion with @jerkerdelsinghttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjerkerdelsing&data=04%7C01%7C%7C6d7a2a1f4f47460fdba008d8e89006e3%7C5453408ba6cd4c1e8b1018b500fb544e%7C1%7C0%7C637515052576199105%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=hamjU0DjAT9e2Xvus1V%2Bq1jPjnwEeU4Q3zOAlm5eTgw%3D&reserved=0 and everyone else with stake in MySQL/MariaDB/PostgreSQL being supported.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Farrowhead-f%2Fcore-cpp%2Fissues%2F71&data=04%7C01%7C%7C6d7a2a1f4f47460fdba008d8e89006e3%7C5453408ba6cd4c1e8b1018b500fb544e%7C1%7C0%7C637515052576209101%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=BDcY1o8Q6bP1jgDRGwJqaQ7DyP3k%2BvkXhioOxeR1zaU%3D&reserved=0, or unsubscribehttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAZBTCOX3NSA23LKC52ZNCLTD52GNANCNFSM4ZIYUZFA&data=04%7C01%7C%7C6d7a2a1f4f47460fdba008d8e89006e3%7C5453408ba6cd4c1e8b1018b500fb544e%7C1%7C0%7C637515052576209101%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=G6Dk5aKUUOnt5EEmxkbGoajESABmFj2YkaPz2dVNMTA%3D&reserved=0.

tsvetlin commented 3 years ago

The goal is to replicate the same behaviour as the JAVA version, in C++. If you wish to use an other database, feel free to create a fork and implement it.

ng201 commented 3 years ago

The funny thing is, that in the tests we use sqlite already. Thus, it might be quite easy to add it to the main code, too (maybe in a new branch? I think that would be enough).

While in the tests it was quite handy that sqlite uses affinities instead of types, I did not have time to check wether some special considerations should be made to handle these specialities...

jerkerdelsing commented 3 years ago

I think we need to ask the question which type of data base will serve our purpose best. For the core systems, ServiceRegistry, Orchestration, Authorisation, SystemRegistry, DeviceRegistry, CertificateAuthoristy, Configuration a list of requirements can be:

Small databases < 10MB of data (100 devices and systems in a Local Cloud).
(connected to the mirco-system paradigm)
Robust (low risk of erroneous writing and reading in the database)
Possibility to have redundance/backup of the database in the local cloud and in a back-up local cloud.
Data protection e.g. authorisation policy database.

For other core systems like e.g. DataManager some other requirements will be more important:

Large data bases Giga - Tera byte

Jerker

Prof. Jerker Delsing Luleå University of Technology EISLAB Luleå, Sweden @.*** 0046706261931

On 20 Mar 2021, at 06:48, Németh Gábor @.***> wrote: The funny thing is, that in the tests we use sqlite already. Thus, it might be quite easy to add it to the main code, too (maybe in a new branch? I think that would be enough).

While in the tests it was quite handy that sqlite uses affinities instead of types, I did not have time to check wether some special considerations should be made to handle these specialities...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

emanuelpalm commented 3 years ago

@jerkerdelsing SQLite can theoretically handle 281 terabytes of data. But as all that data is stored in a single file, the actual maximum can be lower depending on how large disks you can buy and what file system you use. If you use ext4, which is popular for Linux, you are limited at 16 terabytes. If you use btrfs, you can use all 281 terabytes, if you can find a logical disk that large.

Will the DataManager process data from more than the 100s of devices in that local cloud? Even if you have 1000 sensors spewing out 10 megabytes of readings per minute, 16 terabytes is enough to last you for 3 years. You can today buy 8 TB SSDs for a somewhat reasonable price. If you put 5 of those in a RAID 6 configuration you get 24 TB logical single-disk space, which could last you 4,5 years for the same scenario. However, that being said, looking at the DataManager documentation, it seems to me as if it should use some kind of append-only database with support for channels (one per system service) to avoid performance degradation as the stored data volume grows large. I cannot seem to find any, however.

jerkerdelsing commented 3 years ago

Let's lift this point on Wednesday Roadmap session in the Vaccine WS next week.

arrowhead-f / core-cpp

Use embedded database #71

Jerker

Jerker