Scille / parsec-cloud

Open source Dropbox-like file sharing with full client encryption !
https://parsec.cloud
Other
269 stars 40 forks source link

Implement `WorkspaceOps::copy_entry` for file&folder copy in libparsec #7309

Open touilleMan opened 4 months ago

touilleMan commented 4 months ago

depends on PR #7107

File

Copying a file is a trivial operation since we now allow multiple manifests to reference the same blocks (hence copying a file manifest is just changing a matter of changing its id and reset its version to 0).

EDIT: there is still a gotcha here ! The chunks of data that haven't been uploaded must be duplicated in the local storage when copying the file. This is because otherwise one file may remove some of them (typically when doing a reshape) while the other still references them !

This means copying the file can be an arbitrary long process (imagine being online, creating a multiple Go file, then copying it) and hence we cannot do it in an atomic way.

Instead the approach would be:

  1. Copy the file manifest and filter it to strip out every non-synchronized part. This operation can be atomic and is enough for fully synchronized files. For non fully synchronized file, the file end up with the correct size, but multiple "holes" of zeroed data.
  2. For each hole, read the data from the original file and copy it into the new file (hence the file ends up being corrupted if this step doesn't complete).

Another approach would be to implement a kind of reference counting for the chunks in the local storage, but this is a more complicated approach for what seems a niche case. On top of that the gain is only in local: starting from the same chunk, each file will have it own reshape operation that could lead to different blocks to synchronize (given the block may span across an area with other chunks that are specific to each file, e.g. when copying a file then modifying a single byte in it)

Finally, this copy file feature seems similar to the UNIX copy_file_range which is also present in FUSE. So we should also have a WorkspaceOps::copy_file_range to benefit from this optimization in FUSE as well \o/

Folder

Copying a folder is a bit more complex since all children must be recursively copied. Note we don't want to do this in an atomic way (as it would mean blocking a lot of entries).

Possible API

#[derive(Debug, thiserror::Error)]
pub enum WorkspaceMoveEntryError {
    #[error("Cannot reach the server")]
    Offline,
    #[error("Component has stopped")]
    Stopped,
    #[error("Source doesn't exist")]
    SourceNotFound,
    #[error("Source cannot be a parent of destination")]
    SourceIsDestinationParent,
    #[error("Only have read access on this workspace")]
    ReadOnlyRealm,
    #[error("Not allowed to access this realm")]
    NoRealmAccess,
    #[error("Destination already exists")]
    DestinationExists { entry_id: VlobID },
    #[error(transparent)]
    InvalidKeysBundle(#[from] Box<InvalidKeysBundleError>),
    #[error(transparent)]
    InvalidCertificate(#[from] Box<InvalidCertificateError>),
    #[error(transparent)]
    InvalidManifest(#[from] Box<InvalidManifestError>),
    #[error(transparent)]
    Internal(#[from] anyhow::Error),
}

[derive(Debug, Clone, Copy)]
pub enum CopyEntryMode {
    /// Destination may or may not exist.
    CanReplace,
    /// Destination must not exit so that source can be copied without overwritting anything.
    NoReplace,
}

impl WorkspaceOps {
    ...

    pub async fn copy_entry(
        &self,
        src: FsPath,
        dst: FsPath,
        mode: CopyEntryMode,
    ) -> Result<(), WorkspaceCopyEntryError> {
        ...
    }
}
mmmarcos commented 4 months ago

Related to: #5821