Arlodotexe / OwlCore.Storage

The most flexible file system abstraction, ever. Built in partnership with the UWP Community.
16 stars 4 forks source link

StorageTransferManager: Customizing, managing and resuming recursive copy #61

Open itsWindows11 opened 3 months ago

itsWindows11 commented 3 months ago

Instead of IModifiableFolder.CreateCopyOfAsync() and IModifiableFolder.MoveFromAsync(), why not have an interface like ICanTransfer (naming can be debated here) that can be implemented in files and folders, and methods like:

// For files.
Task<IChildFile> CopyAsync(IModifiableFolder destination, ...);

Task<IChildFile> MoveAsync(IModifiableFolder destination, ...);

// For folders.
Task<IChildFolder> CopyAsync(IModifiableFolder destination, ...);

Task<IChildFolder> MoveAsync(IModifiableFolder destination, ...);

// For both files and folders.
Task DeleteAsync(...);

... and leave IModifiableFolder for file/folder creation and distinguishing from read-only folders that can be known at compile time, basically like Windows.Storage.

We can do one method (i.e. CopyAsync and MoveAsync) for storables too, not just files and folders separately.

Arlodotexe commented 3 months ago

... and leave IModifiableFolder for file/folder creation and distinguishing from read-only folders that can be known at compile time, basically like Windows.Storage.

This is what we're already doing by using extension methods + extension method interfaces. You aren't required to implement IMoveFrom or ICreateCopyOf-- you could leave it unimplemented and the consumer could still use the extension methods on IModifiableFolder.

"Copy" and "Move" aren't considered primitive storage operations, since they can be built using Create, Read/Write, and Delete. This is what allows us to make the IMoveFrom and ICreateCopyOf interfaces optional; there's a fallback implementation which only uses the methods available on IFile, IFolder, IModifiableFolder, etc.

As for trying to consolidate or move these methods elsewhere, it doesn't make a lot of sense to. They're optional to the implementor and the consumer has access to them regardless, and the extension method trick only works because they're on the container where the resources are located, rather than the resources themselves.

This seems more similar to how you might have multiple implementations of an IFolderScanner (depth vs breadth), which is a level of complexity that we don't need to accommodate for since we're using extension methods + extension method interfaces on our own storage primitives.

Is there something more specific about the setup here that you're trying to address? If this is about recursive move/copy, we could just swap to issue https://github.com/Arlodotexe/OwlCore.Storage/issues/60?

itsWindows11 commented 3 months ago

It's not about recursive item operations, but I thought this way it'd make sense since on a file manager for example you right click and choose "Copy", then go to the destination and choose "Paste", not open the destination and choose the file to bring it into.

Arlodotexe commented 3 months ago

I think there might be a misunderstanding here 🤔

The library we've built provides a storage abstraction; storage primitives (interfaces), various extensions, and various implementations for apps and other libraries to utilize and build on.

It's not meant to be an SDK of Application Models that you build user-facing apps directly on. You can always build these; such models are application specific. Examples would be SecureFolderFS, Strix Music and Fluent Store, which have SDKs that use OwlCore.Storage in different places and extend it in different ways. Surely the Files app has similar libraries where application-level logic is implemented instead of placing it in the UI?

That said, you can easily accommodate the specific case you've described:

You have everything you need.

Assuming you held onto the file they selected (and for move, the source folder it was in), just pass it into the CreateCopyOf/MoveFrom methods on the destination. If you can't get a reference to the containing folder for some reason, you can easily get it by calling GetParentAsync on the file and checking/casting to IModifiableFolder, so the original resource can be removed on move.

Arlodotexe commented 3 months ago

Let's revisit this from the perspective of a transfer manager, as detailed in the thread https://discord.com/channels/372137812037730304/1255950330588434512/1255950330588434512.

The copied conversation below gives us a realistic starting point for properly tackling recursive folder operations. Linking #60.

@Arlodotexe: I haven't used a download manager app in ages, but I've built download managers for use within applications themselves. Having a generalized version which could then be implemented in an app, whether for a "download manager app" or a download manager embedded inside another app like ⁠strix-music, this a fascinating and potentially useful scenario that we could use to help us solve some of our problems with recursion in the storage abstraction. It gives us a playground to try things, at least.

First thing I'd want us to note-- working in a storage abstraction, "download" is just "copy".

@yoshiask: Can confirm, this is how plugin downloads/installs work in :fluentstorebeta:

@Arlodotexe So you're looking for a way to manage transfers. linking https://github.com/Arlodotexe/OwlCore.Storage/issues/61.

One key difference between a simple "copy" method and a fully-featured transfer manager would be incremental (resumable) progress and progress reporting.

@itsWindows11: In a download manager context it's more involved than just a copy, multithreading needs to be handled so we could just stick to opening a stream and handling chunks & copying ourselves. Sequential downloads (e.g. if the server doesn't provide content length/file size) can use copying directly from the respective OC.Storage implementation.

@Arlodotexe: Right, there will be scenarios where a Stream doesn't know the length, you can only check when the bytes stop. such as certain HTTP servers or a mic stream

This will be difficult to generalize for all applications, but it gives us a place to start. We're at a point where I can share the main scenarios I've had to accomodate in my own projects.

We're at a point where I can share the main scenarios I've had to accomodate in my own projects. These extension methods were built while testing OwlCore.Storage.Nomad. There are two methods here which do recursive things:

@itsWindows11: Strix-style cores can be used here as well for downloader cores, each with different individual configs when needed by the user [one for HTTP, one for OneDrive etc.]

@Arlodotexe: The point of the storage abstraction is to remove the need for such things. Instead, if you had an IStorageTransferManager, implementations would have different strategies for transfer. One such implementation would be the UWP BackgroundDownloader API

@itsWindows11: Download queue can still be represented as a list of IStorables, it's just the cores that'll be used here to integrate with the app's download queue (so it'll be centralized) and have their own config, for example thread/speed limit.

@Arlodotexe: Sure, all options related to threading, parallelism, depth/breadth, etc., would be decided by the implementation you pick. This is worth noting, because it's the same way we solved recursive folder scanning in #strix-music. We created an interface IFolderScanner and implemented it with DepthFirstFolderScanner, which was fine for our purposes. The difficult part is trying to build something that can accommodate all possible applications. In the example above, the recursive CopyToAsync method:

The behavior for anything recursive can be very application specific, so we need to be careful here. For the generalized functionality of "use these fallbacks if you catch these errors while recursing", how could you build folder recursion in a way that allows for this, while also using gitignore/lastupdatetime to skip copying?

Or could that functionality be built separately and composed somehow, and if so, what would that interface look like?

The model plugin system from #strix-music comes to mind :hmm: http://strixmusic.com/docs/index.html#model-plugins Not directly useful (that'd be overkill) but maybe a pattern we could borrow.

Maybe it'd be possible to build a delegating StorageTransferHandler, not unlike the pattern for DelegatingHttpHandler? Not sure that could accommodate everything...

@itsWindows11: I don't think we'll ever need a "fallback to another implementation" here. As for copying, handling gitignore isn't necessary IMO in this scenario, no idea for the last write part but generally file collisions should be handled by the end user or whatever the set option is in the app.

@Arlodotexe: isn't necessary IMO in this scenario One of the primary goals of the storage abstraction is to make sure we can accommodate as many scenarios as possible, and keep the code manageable while doing it. I would like something a little more flexible/modular than implementing a new IFolderScanner or ITransferManager for every app. If someone builds a transfer handler that adds .gitignore support, that functionality being reusable is a big help. This might become a game of conditional graph crawling by passing delegates... I'll think and try to draft something reasonable. Give me a few days.

itsWindows11 commented 3 months ago

Draft as decided in the UWP Community Discord thread:

public interface IStorageTransferManager
{
    IAsyncEnumerable<ITransferItem> GetAllTransfersAsync(CancellationToken cancellationToken = default);

    // Both downloads and uploads are technically considered copy operations, how it's exactly done depends on the implementation.
    Task<ITransferItem> CreateDownloadAsync(IStorable source, IModifiableFolder destination, bool startTransferAfterCreating = true);
    Task<ITransferItem> CreateUploadAsync(IStorable source, IModifiableFolder destination, bool startTransferAfterCreating = true);

    // Throws if the item is indeterminate (i.e. doesn't support progress reporting)
    // TODO: Determine progress type.
    Task<[progress type]> GetProgressAsync(ITransferItem item);

    // Events
    event ProgressChangedEventHandler TransferProgressChanged;
    event TransferEventHandler TransferFaulted;
    event TransferEventHandler TransferErrored;
    event TransferEventHandler TransferStateChanged;
}

Extension interface to support retrieving a single transfer item:

public interface IStorageTransferGetItem
{
    // Get transfer item by either the source storable or its ID.
    Task<ITransferItem?> GetTransferAsync(IStorable storable, CancellationToken cancellationToken = default);
    Task<ITransferItem?> GetTransferAsync(string storableId, CancellationToken cancellationToken = default);
}

Extension interface to support retrieving transfer items with a specific destination:

public interface IStorageTransferGetItemsByDestination
{
    // Filters transfers by destination folder or its ID.
    IAsyncEnumerable<ITransferItem> GetTransfersByDestinationAsync(IModifiableFolder destination, CancellationToken cancellationToken = default);
    IAsyncEnumerable<ITransferItem> GetTransfersByDestinationIdAsync(string destinationId, CancellationToken cancellationToken = default);
}

Extension interface to support operations on a transfer item:

public interface IStorageTransferItemOperations
{
    Task PauseTransferAsync(ITransferItem item);
    Task CancelTransferAsync(ITransferItem item);
    Task DeleteTransferAsync(ITransferItem item);
}

We can additionally define an ITransferItem here:

public interface ITransferItem
{
    bool IsIndeterminate { get; }
    TransferState State { get; }

    IStorable Source { get; }
    IModifiableFolder Destination { get; }
}

public enum TransferState
{
    Cancelled,
    Pending, // Queued or yet to be manually started.
    InProgress,
    Paused, // Paused by the consumer through user action or something else.
    Error,
    Completed
}

An ITransferItem can have different implementations for different operations, for example DownloadTransferItem for download operations, UploadTransferItem for upload operations.

IStorageTransferManager should also have an extension method to get pending (i.e. items which are paused, cancelled but not deleted, marked as pending) transfers. Initially this was proposed as a standalone method but it'll mostly just piggyback on GetAllTransfersAsync() with async LINQ filtering.

Few questions we should answer:

  1. Should ITransferItem be internal and we just provide DownloadTransferItem & the upload counterpart to the consumer or just expose everything?
  2. What's the type we should use for holding progress data? This will also be used in ProgressChangedEventHandler.
  3. How to determine whether the transfer item is indeterminate on creation in a way that doesn't render the abstraction useless?
Arlodotexe commented 3 months ago

Worth noting that we still haven't tackled recursive / graph crawling strategies or parallelism.

Adding from the discussion:

@Arlodotexe: We should maybe do what we did in the original CommunityToolkit.Storage proposal and build something generic and minimal, by comparing existing solutions that we'd want to be able to implement. That would be a good way to reduce the complexity so we can design the functionality in layers. @itsWindows11 The UWP background downloader that only handles IStorageFile would be implementation-specific for WindowsStorageFile and WindowsStorageFolder as the destination (for downloading) or source (for uploading). Even if it implements our common IStorageTransferManager :thinking: Bringing this back for framing:

I think the uwp background downloader also covers concurrency and parallelism on top of basic transfer functionality, which we haven't touched on yet. I suppose concurrency and parallelism functionality can be overloads on more derived interfaces? e.g. IConcurrentTransferManager : IStorageTransferManager? Not sure if that's an approach that can be generalized. :hmm:

@itsWindows11: What if you need to use parallelism for specific transfer items and others without it? That's also something the configuration approach I did earlier doesn't handle.

@Arlodotexe: Well, that's why we need to figure out composable graph crawling. It's something we can separate from other features, and provide via implementation/consumer/inbox. I had mentioned earlier that it might become a game of passing delegates, that might be a place to start. At least to get a picture for what we need to design here. This might be useful to study for ideas: https://ipld.io/specs/selectors/ Keeping all of the above in mind, it may end up looking like this but with more delegates and with functionality composed over a few interfaces: https://github.com/Arlodotexe/OwlCore.Storage/blob/424d557952d2b2378c3600d716cb9877a4b8e1be/src/Extensions/CopyAndMoveExtensions.cs#L66C105-L66C130