audeering / audbackend

Manage file storage on different backends
https://audeering.github.io/audbackend/
Other
3 stars 0 forks source link

Validate download / upload of files #189

Closed frankenjoe closed 3 months ago

frankenjoe commented 4 months ago

To validate the upload of file to the backend I currently do:

backend.put_file(src_file, dst_file)
if audeer.md5(src_file) != backend.checksum(dst_file):
    # upload failed
    backend.remove_file(dst_file)
    ...

Likewise, we can validate the download of a file:

backend.get_file(src_file, dst_file)
if audeer.md5(dst_file) != backend.checksum(src_file):
    # download failed
    os.remove_file(dst_file)
    ...

I think it makes sense to move such a check into the functions. Since it comes with an additional cost of calculating the checksum for source and destination file (in case of put_file() we already already calculate the checksum of the source file) we could make it an optional step.

In case of a mismatch of the checksum, it makes sense to remove the file. But what should we do next:

  1. silently leave the function -> probably not a good idea, since the user assumes the file is there now.
  2. notify the user of the failure via return value -> put_file() does not have a return value yet, so we could e.g. return a boolean; get_file() currently returns the full path to the local file, so here we have to return None. To avoid different return types, we could return the full path of the source file in case of put_file() and also None in case of a failure.
  3. Raise an error that informs the user that the operation was not successful.

A fourth option would be to retry the operation first and only if it fails again fall back to one of 1.-3.

hagenw commented 4 months ago

Maybe we can check what other applications do if the checksum does not match.

As we plan to make the check optional, the third option of raising an error seems a good solution to me. When a user opts in for checking MD5 sums, it should not silently continue if a sum does not match. If the user wants to add custom handling of those cases, she can do it with a try-except statement. This seems easier to me, than having to deal with different return types.

frankenjoe commented 4 months ago

Option 3. makes most sense to me, too. To not overcomplicate we can leave it to the user to catch the error and possibly retry the operation. Regarding the error type we could raise a InterruptedError or do you think we should introduce a custom error type?

hagenw commented 4 months ago

No, using an existing type seems to me the better option.

frankenjoe commented 3 months ago

Implemented in #190