Open cinghiopinghio opened 5 years ago
Indeed, new-entries
contains the whole history of the database. However, since the number of read bytes are stored, larger files shouldn't slow it down too much. And by using .decsync-sequence
files, unchanged directories are skipped. But it does always grow.
I don't have any concrete plan to mitigate this. But a way to do it is to delete old entries. This can be entries which are read by everyone, overwritten by a new value, or have aged (for example, very old RSS articles statuses are irrelevant). We then have to store the datetime instead of the number of bytes, but that makes skipping files less efficient. Maybe there is a way to still use bytes?
The size impact could be reduced by only storing diffs inside new-entries. This means updating entries in stored-entries is simply applying a diff to the existing entry.
Since all values of the store are most likely text based, diffs should work well.
In terms of deleting entries you would only delete entries at the top of the file and store how many bytes have been deleted. When other apps read the files they simply substract this number from the stored read-bytes and start at this new offset.
Obviously only entries can be deleted that have already been read by all other apps otherwise information is lost.
The only caveat is that this is incompatible with diffs.
I have thought a bit more about reducing the size of the files. Note that I have introduced versioning for DecSync, so incompatible changes are now possible. A few ideas:
read-bytes
can be removed. Skipping some content of a file is not that much faster, and being able to remove values is definitely worth it. Its main advantage was skipping whole files, but using .decsync-sequence
files (or similar) also achieves this.new-entries
can be removed as well. We only need to store stored-entries
and just read the whole file when it changes. Multiple occurrences should only be executed once.stored-entries
that point to the original app that introduced the value. This does not introduce problems when that original value is removed, as that means it is either updated or has aged.I have now actually released a draft of version 2!
The main motivation is that Android is moving to SAF, which is very, very slow. The main slowdown is in having a lot of different files inside a lot of directories. This is mostly solved by making the structure a lot flatter. In addition, it is simplified using the first 2 points given above, the entries are duplicated less and it is easier to remove old entries.
The 1 byte hash systems seems a little to limiting in my opinion. With only 256 possible path hashes wouldn't there be a high probability of collisions? Let's say you have only 20 paths, then the probability of at least one collision should already be at arround 50%. Have I overlooked something here?
This is a great idea, thanks for sharing. I'm mostly worried about the size of
new-entries
which, according to my understanding, is always growing, containing the whole history of the database. Is there a plan to introduce some mitigation of this?