Open frankenjoe opened 2 years ago
Is this something we should tackle before a new release or should we simply stick with the current solution for now?
I think it doesn't matter as it is very unlikely to occur. If you have the time to implement it now, please go ahead and we wait with the release.
Ok, then let's move on with a new version and tackle this some other time.
The solution with the .complete
file might work as there should no reason it need to be removed afterwards (otherwise we would have the same problem as with the .lock
files).
There could be the case that a user manually deletes file in the folder,
so I think we should make sure that if audb.load()
realizes at the end that the database is not complete,
it will then remove the .complete
file. Even though we officially don't support that users mess around with the cache.
so I think we should make sure that if audb.load() realizes at the end that the database is not complete,
That means we have to check for every media file if it exists but the idea of the .complete
file is to avoid such checks. So I don't think we should consider the case that a user manually changes files in the cache folder. E.g. if a user deletes files from an attached folder we would not even notice it.
I would indeed also not check for single files. I would just apply our current mechanism to check if a database is complete and if this disagrees with the presence of the .complete
file remove it. But maybe this case can never happen anyway.
But isn't our current mechanism checking if every media file exists? I don't see another way to find out if a database is complete.
Yes, you are right: https://github.com/audeering/audb/blob/7d71d8f14a9d4f654329ae720b27aad50b5cbbf3/audb/core/load.py#L139-L159
And once a database was marked once as complete we never check again. If we implement the same behavior using a .complete
file, we never have to remove the file once created.
In the end, it's just a different way of storing the information if the database is complete. So far we have it as flag in the header of the database. Using a .complete
file is more elegant, though, as we don't even have to access the header to get this information, i.e. there is no locking involved at all.
I would even argue we can simply skip saving the information in the header. In the worst case someone loading with an older version of audb
will run _database_check_complete
again, but it will not break any code.
I would even argue we can simply skip saving the information in the header. In the worst case someone loading with an older version of audb will run _database_check_complete again, but it will not break any code.
I'm not sure about this one. It would indeed more elegant to no longer store it in the header at all, but this would mean loading a big database with an older version of audb
could then be (at least one time as it will then store the info in the header) very slow.
I'm not sure about this one. It would indeed more elegant to no longer store it in the header at all, but this would mean loading a big database with an older version of
audb
could then be (at least one time as it will then store the info in the header) very slow.
Just once, because the older version will then store this information in the header :)
But on the long run we will get rid of the header entry.
Ah sorry, you already pointed out that it will be only once slow. To me this is fine. Otherwise we will carry around the redundant information forever.
And don't forget - soon users will have to update audb
(due to the changes to audbackend
) anyway :)
Once a database is completely loaded there is actually no need to lock it. But currently we need to get access to the
db.yaml file
to get this information and there we could already run into a race condition. Instead we could create a.complete
file in the cache folder to signal that a databases is complete. And we only acquire the lock if that file does not exist.