Scalability - Githubissues

cinghiopinghio commented 5 years ago

This is a great idea, thanks for sharing. I'm mostly worried about the size of new-entries which, according to my understanding, is always growing, containing the whole history of the database. Is there a plan to introduce some mitigation of this?

39aldo39 commented 5 years ago

Indeed, new-entries contains the whole history of the database. However, since the number of read bytes are stored, larger files shouldn't slow it down too much. And by using .decsync-sequence files, unchanged directories are skipped. But it does always grow.

I don't have any concrete plan to mitigate this. But a way to do it is to delete old entries. This can be entries which are read by everyone, overwritten by a new value, or have aged (for example, very old RSS articles statuses are irrelevant). We then have to store the datetime instead of the number of bytes, but that makes skipping files less efficient. Maybe there is a way to still use bytes?

Backfighter commented 4 years ago

The size impact could be reduced by only storing diffs inside new-entries. This means updating entries in stored-entries is simply applying a diff to the existing entry.

Since all values of the store are most likely text based, diffs should work well.

Backfighter commented 4 years ago

In terms of deleting entries you would only delete entries at the top of the file and store how many bytes have been deleted. When other apps read the files they simply substract this number from the stored read-bytes and start at this new offset.

Obviously only entries can be deleted that have already been read by all other apps otherwise information is lost.

The only caveat is that this is incompatible with diffs.

39aldo39 commented 4 years ago

I have thought a bit more about reducing the size of the files. Note that I have introduced versioning for DecSync, so incompatible changes are now possible. A few ideas:

read-bytes can be removed. Skipping some content of a file is not that much faster, and being able to remove values is definitely worth it. Its main advantage was skipping whole files, but using .decsync-sequence files (or similar) also achieves this.
new-entries can be removed as well. We only need to store stored-entries and just read the whole file when it changes. Multiple occurrences should only be executed once.
To reduce the size of the values (e.g. calendar events), we might even use references inside stored-entries that point to the original app that introduced the value. This does not introduce problems when that original value is removed, as that means it is either updated or has aged.

39aldo39 commented 4 years ago

I have now actually released a draft of version 2!

The main motivation is that Android is moving to SAF, which is very, very slow. The main slowdown is in having a lot of different files inside a lot of directories. This is mostly solved by making the structure a lot flatter. In addition, it is simplified using the first 2 points given above, the entries are duplicated less and it is easier to remove old entries.

Backfighter commented 3 years ago

The 1 byte hash systems seems a little to limiting in my opinion. With only 256 possible path hashes wouldn't there be a high probability of collisions? Let's say you have only 20 paths, then the probability of at least one collision should already be at arround 50%. Have I overlooked something here?

39aldo39 / DecSync

Scalability #2