datalad / datalad-next

DataLad extension for new functionality and improved user experience
https://datalad.org
Other
7 stars 8 forks source link

add `delete`-method to `UrlOperations` #733

Closed christian-monch closed 2 months ago

christian-monch commented 3 months ago

Fixes #533

christian-monch commented 3 months ago

The remaining error in datalad-core 4 is due to a typo in datalad. I opened PR datalad #7628 to fix it in datalad.

An offending test, i.e. datalad.locals.tests.test_wtf.test_wtf, has been disabled for now in datalad-core 4.

codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 94.73684% with 5 lines in your changes missing coverage. Please review.

Project coverage is 92.44%. Comparing base (f00cfdb) to head (80bc38c).

Files Patch % Lines
datalad_next/url_operations/tests/test_http.py 75.00% 3 Missing :warning:
datalad_next/url_operations/file.py 84.61% 1 Missing and 1 partial :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #733 +/- ## ========================================== - Coverage 92.47% 92.44% -0.03% ========================================== Files 195 195 Lines 14301 14356 +55 Branches 2162 2164 +2 ========================================== + Hits 13225 13272 +47 - Misses 812 818 +6 - Partials 264 266 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

christian-monch commented 2 months ago

Thanks for this PR!

A delete operation here is conceptualized as "the resource is gone afterwards" -- and the test implementations confirm that.

This makes sense to me. When we send a DELETE via HTTP to some API it will perform whatever makes sense, without necessarily making a client aware of related complexities. Likewise, here, for a deletion of a directory via SSH, the deletion is automatically performed recursively, without something like a "directory-must-be-empty" safety net. It feels rough, but it makes sense.

I feel like it should be made clearer in the docs that this is what is implemented. Right now it is subject to speculation.

Thanks for the comment. It got me thinking about the automated recursive delete. I opted for not deleting a non-empty directory in SshUrlOperations.delete. The reasons were:

  1. consistency with FileUrlOperations.delete
  2. reduce the chance of unintended data removal.

The doc-string of SshUrlOperations.delete has been amended by a description of the behavior of SshUrlOperations.delete on non-empty directories.

mih commented 2 months ago

I opted for not deleting a non-empty directory in SshUrlOperations.delete.

Hmm, ok. I wonder how a client implementation could manage to delete a non-empty directory now?

AFAIK, there is no operation that would allow a client to discover the content of a directory. This seems to rule out a (expensive) client-side recursion. How would it be done?

christian-monch commented 2 months ago

[...] Hmm, ok. I wonder how a client implementation could manage to delete a non-empty directory now?

AFAIK, there is no operation that would allow a client to discover the content of a directory. This seems to rule out a (expensive) client-side recursion. How would it be done?

It could not be done in the context of SshUrlOperations (I think the same holds for FileUrlOperations). So, removing a non-empty directory would require a different communication channel. I can see that this is easier in the FileUrlOperations context.

I think this is a good argument to keep the removal of non-empty directories in SshUrlOperations.delete. It might even be an argument to modify write permissions to ensure the deletion of a remote directory (if we own it). I brought back the non-empty directory delete and added deletion of write-protected targets.

The only thing that irritates me slightly is the different semantics of FileUrlOperations.delete and SshUrlOperations.delete.

mih commented 2 months ago

I think it makes sense to keep the semantics of all implementations aligned and also to document the intent behind that.

Even if it is possible to work around limitations in a filesystem context, it would still defeat the purpose of the handlers. The basic idea was to be able to code around/with defined operations and achieve some protocol abstraction through that. If individual implementations come with individual semantics, this feature is lost (or never achieved) -- which makes them just arbitrary standalone implementations, with little advantage of adopting them over any other solution.

I'd vote to align the semantics of all implementations, incl. the filesystem one.

christian-monch commented 2 months ago

FileUrlOperations.delete and SshUrlOperations.delete now have matching semantics. Both delete a file or directory (including its content). Both can delete write-protected targets if the effective user can modify the permissions of the targets.

They do not try to set write permissions on the containing directory. That means they cannot delete targets from write-protected directories.

The semantics of HttpUrlOperations.delete is not discussed since it is server-dependent.