ARK-Builders / arklib

Core of the programs in ARK family
MIT License
1 stars 10 forks source link

#48: Atomic files #50

Closed gwendalF closed 7 months ago

gwendalF commented 9 months ago

Use atomic files whenever we want to use files. It ensure there are written atomicaly. Also we have the possibility to manage the case when the file is already present and do something with the data. Currently it's just overwritten but something better can be done Fix #48

kirillt commented 9 months ago

@gwendalF have you implemented this, too?

If several versions with same number are found, take the version with local device id and log warning about conflicted versions. These conflicts can come only from remote devices, locally we can't produce conflicts.

mberry commented 9 months ago

Adding a Clippy CI check is good, but I'm getting 1 error here:

https://github.com/ARK-Builders/arklib/blob/4dc4709b1eece7c12112ccb7a90803cad4379785/src/index.rs#L78

It wants the borrow removed

gwendalF commented 9 months ago

Adding a Clippy CI check is good, but I'm getting 1 error here:

https://github.com/ARK-Builders/arklib/blob/4dc4709b1eece7c12112ccb7a90803cad4379785/src/index.rs#L78

It wants the borrow removed

We can remove the borrow since it does not appears to be used somewhere else so we can move the path on the open function

gwendalF commented 9 months ago

I've updated the case to not override properties. May need check because there is a bunch of clone and staff, so we can surely do it better.

May need a visitor pattern to merge two serde_json::Value

kirillt commented 9 months ago

@gwendalF Thanks, good changes. Could you tell more about visitor pattern here?

May need a visitor pattern to merge two serde_json::Value

kirillt commented 8 months ago

I've realized that we should use lightweight version of atomic files for resources.

In most cases, there will be no other resource with same id. There could be rare hash collisions but they should not be treated as versions. Probably simple "write in temporary location, move to the final path" approach should be implemented, too? @gwendalF what do you think?

gwendalF commented 8 months ago

@kirillt writing to temp and move is kind of the same of what atomic file is doing. But without versioning moving would erase previous file. So it will only solve half written files but not synchronisation. Half written could only happens if the process is killed during writing.

Can be done but I don't know when half written files happens

kirillt commented 8 months ago

@gwendalF yeah, the idea behind atomics for resources is that we just want to protect from half-written files because it should be considered a unique case when we write a new resource. Of course, there are hash collisions sometimes, but we can use some nonce (maybe just timestamp) in the filename to protect from that. The scenario with editing properties is much more prone to dirty writes because it can be done from different devices multiple times. If we want to create same resource again, we just check if it exists already or not.

gwendalF commented 8 months ago

@kirillt But the resource (link) has some properties. So writing properties is needed when we create a link

kirillt commented 8 months ago

@gwendalF yes sure, we just should use different means to write metadata and content itself. Because property and other storages are shared between multiple resources. If something goes wrong during resource write, only this resource will be lost. If something foes wrong while writing a storage, then we risk losing metadata for other resources too.

For the properties storage it's not critical though because it's folder-based, but the tag storage is a single file containing multiple entries. We also might support batching for folder-based storages in the future, e.g. writing by 20 jsons into individual files—this should improve performance, but if we lose a batch we lose up to all 20 jsons.

kirillt commented 8 months ago

Just a bit about terminology: "resource" means some user's content. We don't call it file because if we change its path, the resource still remains the same.

We have various "metadata" associated with resources: 1) user-defined, e.g. tags, properties, scores etc. 2) app-generated, i.e. cache: metadata/kinds and previews/thumbnails

gwendalF commented 8 months ago

@kirillt It's not clear for me what you wish. I've realized that we should use lightweight version of atomic files for resources. -> What resources? Link? For example tag storage in never used in the current arklib, only the path is defined. If you wish to write Link without atomic, it's possible. But then what about properties? And what about metadata ? And previews?

kirillt commented 8 months ago

@gwendalF a "resource" is any user's file. In Shelf app, we work only with Link resources, but Navigator works with any resources: images, videos, documents. We store resources as files in a folder selected by the user.

As I said before, properties and other user-defined metadata (tags, scores) should use atomic files you've implemented, i.e. it shouldn't be changed.

Resources, i.e. any file outside .ark subfolder, should use simple write-tmp-and-move approach.

Previews, thumbnails and generated metadata don't really require atomic writing, because we always can regenerate them. But we can also use write-tmp-and-move approach.

gwendalF commented 8 months ago

@kirillt I've updated the Link to use your solution of tempfile and moving to destination