CondensationDS / Condensation

Condensation is an open-source data system for building cloud applications while keeping the ownership of data.
https://condensation.io
Apache License 2.0
442 stars 10 forks source link

Project Status #6

Closed ansarizafar closed 3 years ago

ansarizafar commented 3 years ago

Is this project dead?

AlexikM commented 3 years ago

Hello, thanks for your message, it is not, after a short break we are preparing a new setup.

We are now delivering a few projects with Condensation which help us to prepare the javascript version to present demos/tutorials for web applications. We are also investigating new business opportunities.

We will communicate soon about the status of the project.

ansarizafar commented 3 years ago

Thanks for reply. I am happy to know that this project is not dead. Developers like me are looking for a better database to replace decades old RDMS.

I am unable to find information about query language in documentation. I suggest a Datalog like query language(https://terminusdb.com/docs/terminushub/reference/woql or https://docs.flur.ee/guides/1.0.0/analytical-queries/inner-joins-in-fluree). I also suggest a discord server for community building.

d28b commented 3 years ago

Thank you for the suggestions.

Condensation does not currently have a query language in the conventional sense, and this is not even a goal. Condensation offers the following structures:

The concept of a query language is tightly linked with the structure of relational databases (whether SQL or NoSQL). Condensation is more like a network of documents linked with each other, and you are navigating through that network.

ansarizafar commented 3 years ago

Condensation is more like a network of documents linked with each other, and you are navigating through that network.

Condensation is like a graph database and all graph databases have a query language. It would be difficult to build apps without a proper query language.

d28b commented 3 years ago

No, Condensation is not a graph database. Just like an animal with pointy ears isn't necessarily a cat.

Query languages are of interest if the computation is carried out on another computer. With Condensation, data is kept locally anyway. Your app can just read and process the data. You don't need a query language. It's much simpler than that. Once you've opened the data, you navigate through a tree (a bit like a filesystem), and gather the information you need.

ansarizafar commented 3 years ago

How is it possible to store whole database (multiple gbs) on user's device specially on mobile phones.

d28b commented 3 years ago

In case you are looking for a traditional centralized database with a query language, then Condensation isn't for you, I'm afraid. There are plenty of centralized cloud solutions out there. We have no intention to compete in that market.

Regarding user data:

When using Condensation, you design your system in such a way that the users' data can reside on the users mobile phone. The approach we take is very different from a centralized cloud solution.

Since users only keep their own data on the device, the device memory is usually big enough. Take a typical messenger app, such as WhatsApp: the users' data is stored on the device, sometimes taking several GB (with photos and media). Many other apps require far less data.

There are projects where the device memory is not enough, and there are a few simple solutions for that:

ansarizafar commented 3 years ago

Index server is like a traditional centralized database without a query language right? I am very much interested in a new database, just trying to understand how Condensation works.

d28b commented 3 years ago

We are talking about a distributed system. An index server is comparable to an expert in our society. If you have a question regarding a specific topic, you contact this expert.

You may have an index server knowing all employees by name, for example. If you are an employee of the company, you'd register yourself with that index server, and this index server adds you to the list. If you're looking for another employee, you can send a query to the index server.

An index server infrastructure actually consists of two parts:

For performance reasons, an index query server typically keeps the whole index in memory (or on high speed disks). Queries are often made using HTTP/REST requests.

Index masters/query servers contain some application-specific code. The index master may have to identify you as an employee, for example. The index master may also aggregate or compute data. An index query server may limit the number of results to prevent misuse.

For maximum scalability, every index master/server manages a single index only. For small systems with little load, they may all run on the same physical/virtual server, however.

For maximum security, the index master runs in a secure location outside of the datacenter, and reads messages asynchronously. Index query servers run inside the datacenter and are publicly accessible, of course.

Unlike with classical databases, indices are eventually consistent only, and sometimes not consistent at all by design. E.g., you may be able to participate in the system without registering with the index of employees. These things are application-specific.

As you can see, with Condensation you are building a (distributed) data system with multiple "actors" (as we call them) that communicate with each other. For small systems, we usually have all actors running on the same virtual machine in some datacenter. For larger systems, you can easily scale up.

ansarizafar commented 3 years ago

Thanks for the detailed explanation.

AlexikM commented 3 years ago

Thank you for your questions Ansar, they are very helpful to prepare the next step for Condensation which is to create a more comprehensive and step-by-step introduction - for that we are preparing further materials which will be published on a dedicated website this autumn. If you are interested in contributing and doing a deep dive into the core please directly contact Thomas by email.

ansarizafar commented 3 years ago

Yes I am interested. Could you please share Thomas's email address?

AlexikM commented 3 years ago

Sure, you can find his details here (https://viereck.ch/thomas/)

ansarizafar commented 3 years ago

I am about to start a new project and I would to love to use Condensation. Is it somehow possible to play with Javascript version?

ansarizafar commented 3 years ago

every index master/server manages a single index only.

If we have separate index servers for customers, products and orders then how can we query all customers who bought a particular product in last six months.

d28b commented 3 years ago

Answer 1

In a typical SQL database, you'd have three tables: customers, products, and orders. With Condensation, you would think about this differently:

With this structure, your query would be executed as follows:

  1. Get a slice of the last few months of orders.
  2. Loop through these orders.
  3. If the order contains the product we are looking for, add the customer (actor hash) to the result set.

Answer 2

Let's assume you have a similar problem, for which you really have three "tables" with three indices.

You would create an analytics actor. This actor would load all three indices directly (not query the index servers, but get the indices from the respective index masters), and process that data. Implementing this is not as elegant as writing a SQL statement, but it's just a couple of loops, nothing complicated.

If you do such queries a lot, you'd build a new index with exactly this data, organized in an efficient way, and set up an index server to respond to queries.

For small systems, you could just create a single index master + index server, create all indices there, and keep everything in memory.

ansarizafar commented 3 years ago

Answer 1 looks viable but the question is where all customers/products/orders data will be stored and where the computation will be performed If we are building an eCommerce platform for a big retailer with thousands of customers/products/orders, surely it can't be done on user device. We also need customer/product names and other properties and not just hashes.

I am asking these question as we need to show developers that CondensationDB is better than other available solutions.

d28b commented 3 years ago

Every actor stores all data they need. More precisely:

An order contains/references everything necessary to fully process the order. It would contain a shipping address and a payment method, for instance.

Some eCommerce platforms provide the possibility of storing more than one delivery address, and then pick one when ordering. The list of delivery addresses are user-private data. The order contains the chosen delivery address.

When submitting an order, the user actor sends a message with the order to the eCommerce actor. The eCommerce actor verifies the order and replies with a appropriate message.

Every product is an object (or small tree) containing all information about the product (id, name, description, image references³, ...).

When searching for products, the user queries the product index, and then retrieves the found products. A user doesn't generally store products locally, except perhaps for products in his cart / wishlist, or recently visited products.

Footnotes:

  1. If you provide a login server, users can log in using a username and password, and an encrypted copy of the user data will be stored on your Condensation store. A login server sort of reverts Condensation back to a cloud solution (from the user's perspective). You still get all the advantages of Condensation. Using a login server is optional, and even if you provide one, users can still opt out and manage their data on their own, without using the login server.
  2. The product actor and the eCommerce actor may use the same Condensation store, so that the data is not physically duplicated.
  3. Images, technical datasheets, ... are stored as Condensation objects. No separate storage for that is necessary.
ansarizafar commented 3 years ago

It means CondensationDb design is very flexible and it can be used for different use cases.

Messages are like bidirectional streaming RPCs via web sockets right?

d28b commented 3 years ago

Yes, you can look at messages as RPCs.

Protocols:

Note three things regarding messages:

ansarizafar commented 3 years ago

We are about to have a working implementation, and I'll publish/document this within the next few months.

I can work on documentation site If you need helping hand.

AlexikM commented 3 years ago

I will close this issue, Ansar to continue the conversation I invite you to use the discussion section in GitHub HERE