Do not lock cache folder for complete database - Githubissues

audeering / audb

Manage audio and video databases

https://audeering.github.io/audb/

Other

23 stars 1 forks source link

Do not lock cache folder for complete database #197

Open frankenjoe opened 2 years ago

frankenjoe commented 2 years ago

Once a database is completely loaded there is actually no need to lock it. But currently we need to get access to the db.yaml file to get this information and there we could already run into a race condition. Instead we could create a .complete file in the cache folder to signal that a databases is complete. And we only acquire the lock if that file does not exist.

frankenjoe commented 2 years ago

Is this something we should tackle before a new release or should we simply stick with the current solution for now?

hagenw commented 2 years ago

I think it doesn't matter as it is very unlikely to occur. If you have the time to implement it now, please go ahead and we wait with the release.

frankenjoe commented 2 years ago

Ok, then let's move on with a new version and tackle this some other time.

hagenw commented 1 year ago

The solution with the .complete file might work as there should no reason it need to be removed afterwards (otherwise we would have the same problem as with the .lock files).

There could be the case that a user manually deletes file in the folder, so I think we should make sure that if audb.load() realizes at the end that the database is not complete, it will then remove the .complete file. Even though we officially don't support that users mess around with the cache.

frankenjoe commented 1 year ago

so I think we should make sure that if audb.load() realizes at the end that the database is not complete,

That means we have to check for every media file if it exists but the idea of the .complete file is to avoid such checks. So I don't think we should consider the case that a user manually changes files in the cache folder. E.g. if a user deletes files from an attached folder we would not even notice it.

hagenw commented 1 year ago

I would indeed also not check for single files. I would just apply our current mechanism to check if a database is complete and if this disagrees with the presence of the .complete file remove it. But maybe this case can never happen anyway.

frankenjoe commented 1 year ago

But isn't our current mechanism checking if every media file exists? I don't see another way to find out if a database is complete.

hagenw commented 1 year ago

Yes, you are right: https://github.com/audeering/audb/blob/7d71d8f14a9d4f654329ae720b27aad50b5cbbf3/audb/core/load.py#L139-L159

And once a database was marked once as complete we never check again. If we implement the same behavior using a .complete file, we never have to remove the file once created.

frankenjoe commented 1 year ago

In the end, it's just a different way of storing the information if the database is complete. So far we have it as flag in the header of the database. Using a .complete file is more elegant, though, as we don't even have to access the header to get this information, i.e. there is no locking involved at all.

I would even argue we can simply skip saving the information in the header. In the worst case someone loading with an older version of audb will run _database_check_complete again, but it will not break any code.

hagenw commented 1 year ago

I would even argue we can simply skip saving the information in the header. In the worst case someone loading with an older version of audb will run _database_check_complete again, but it will not break any code.

I'm not sure about this one. It would indeed more elegant to no longer store it in the header at all, but this would mean loading a big database with an older version of audb could then be (at least one time as it will then store the info in the header) very slow.

frankenjoe commented 1 year ago

I'm not sure about this one. It would indeed more elegant to no longer store it in the header at all, but this would mean loading a big database with an older version of audb could then be (at least one time as it will then store the info in the header) very slow.

Just once, because the older version will then store this information in the header :)

But on the long run we will get rid of the header entry.

frankenjoe commented 1 year ago

Ah sorry, you already pointed out that it will be only once slow. To me this is fine. Otherwise we will carry around the redundant information forever.

frankenjoe commented 1 year ago

And don't forget - soon users will have to update audb (due to the changes to audbackend) anyway :)