cs3org / reva

WebDAV/gRPC/HTTP high performance server to link high level clients to storage backends
https://reva.link
Apache License 2.0
167 stars 113 forks source link

ocis needs an api to list all files of a user he marked as favorite or that are tagged with a certain tag. #1394

Open butonic opened 3 years ago

butonic commented 3 years ago

AFAICT this should be a dedicated service. Similar to shares, we should persist the data in the storage, but need to cache the information so that requests that would otherwise have to scan all files can use that cache to speed up these queries.

This is similar to tags ... which are currently shared by files. They are also persisted in the storage and need a cache or rather an index to look tem up.

For ocis accounts @IljaN @refs and @kulmann built an index that we might use for this. It uses symlinks / small files on disk. so has no special service dependency and can even use a cs3 storage provider for persistence.

For favorites this is the curl request:

curl 'https://cloud.ocis.test/remote.php/dav/files/einstein/' \
  -X 'REPORT' \
  -H 'authorization: Bearer ...' \
  --data-binary $'<?xml version="1.0"?>\n<oc:filter-files  xmlns:d="DAV:" xmlns:oc="http://owncloud.org/ns">\n  <d:prop>\n    <oc:permissions />\n    <oc:favorite />\n    <oc:fileid />\n    <oc:owner-id />\n    <oc:owner-display-name />\n    <oc:share-types />\n    <oc:privatelink />\n    <d:getcontentlength />\n    <oc:size />\n    <d:getlastmodified />\n    <d:getetag />\n    <d:resourcetype />\n  </d:prop>\n<oc:filter-rules>\n<oc:favorite>1</oc:favorite>\n</oc:filter-rules>\n</oc:filter-files>' \
  --compressed

It does a REPORT, which is currently not implemented: https://github.com/cs3org/reva/blob/master/internal/http/services/owncloud/ocdav/report.go#L45

butonic commented 3 years ago

@ishank011 @labkode let me know what you think. How should we add this to CS3?

phil-davis commented 3 years ago

In a way, "favourites" is just a "special pre-defined tag". I imagine that tags have some sort of unique id in the back-end (UUID?), which is the thing that is stored with the resource that is tagged. (that allows the tag name/text to be an easily-changed attribute of the tag).

"favourite" could be implemented in the back-end with a "well-known tag UUID". And so an API request to "favourite" a resource can use the "tags" service(s) to remember the "favourite". Then if the tags service does some useful caching, "favourites" will automagically get the same benefit.

ishank011 commented 3 years ago

@butonic we already have the functionality to favourite resources using the SetArbitraryMetadata method, so we can use that. Adding a separate service doesn’t make a lot of sense to me. The CS3APIs ‘List’ method has a filter parameter which can be extended to consider tags as well.

IMO we can add tag based caches to the fs layer. Sure, we can extend the definition of tags by adding mutable attributes, as @phil-davis suggested, but this can be easily accommodated in the storage provider service.

ishank011 commented 3 years ago

Apologies. ListContainer doesn't have filters yet but ListSharesRequest does. We can extend it the same way.

butonic commented 3 years ago

ListSharesRequest is an rpc on the Share Manager.

ListContainer lists files for a Container / Collection / Directory. But favorites can be spread everywhere.

I did add setting favorites in https://github.com/cs3org/reva/pull/1393 but now I need a way to list all the files a user has marked as favorite.

But this is a broader topic because finding all files tagged with a certain tag is similar.

Or ... finding all files with a certain name?

Or listing the most recently edited (or read) files?

Or finding by indexed file content?

Or finding by author?

These are all queries that a traditional filesystem is unsuited to anser. It should be a separate service. In the backend it might adapt to an existing storage system like eos. But I think this should be a dedicated service, more like the share manager.

ishank011 commented 3 years ago

Okay yes, I didn't take this into account. Agreed, a separate service with a pluggable file system and multiple indices backed by caches would make a lot of sense.