abraunegg / onedrive

OneDrive Client for Linux
https://abraunegg.github.io
GNU General Public License v3.0
10.07k stars 859 forks source link

Feature Request: On-Demand Files #757

Open IncPlusPlus opened 4 years ago

IncPlusPlus commented 4 years ago

Currently on-demand file downloading is not implemented. #105 was opened and closed due to inactivity. While I'd love to contribute this and make a PR instead of a feature request, this is a bit too daunting of a task for me. When "download files on-demand" is checked in the client on Windows, the file behavior is as follows.

abraunegg commented 4 years ago

@IncPlusPlus

Currently on-demand file downloading is not implemented

This is because the OneDrive API does not provide any capability to advertise what file has been tagged via OneDrive as 'on-demand' - thus there is zero mechanism to determine which file has this flag or does not.

For each OneDrive drive item the following is exposed: https://docs.microsoft.com/en-us/onedrive/developer/rest-api/resources/driveitem?view=odsp-graph-online

Until a driveItem property exists that represents that an item is 'on-demand', this cannot be supported by this client.

  • If a file is to be opened or otherwise used (such as copying or moving) by any program and it is not yet downloaded, it will be downloaded before it is used. It will be downloaded in-place (meaning it is not stored in a temporary location, to my knowledge). After it has been downloaded like this, the file stays downloaded and can be viewed/modified as many times as you like up until "Free up space" is clicked

As this program is CLI based, to do this, your talking about a whole lot of GUI development to add this sort of functionality to a desktop manager (gnome/kde etc)

  • (This one could be a stretch goal for a later addition) When right clicking on a folder in OneDrive, there is an option listed to "Free up space". What this does is removes any files that were downloaded (with the exception of files/folders which have "Always keep on this device" checked off)

By using the OneDrive API, there is no concept of 'a device' - thus, no way to track that this 'instance' is the instance where the file must always be kept. As per above, there is no driveItem resource property that represents even this option.

In short, the OneDrive API is missing this:

As per #105, it might be worth asking the question here: https://github.com/OneDrive/onedrive-api-docs to see if there is any plans to support the missing driveItem properties that would potentially allow some sort of development in this area. Without these properties being exposed, this is a non-starter.

IncPlusPlus commented 4 years ago

This is because the OneDrive API does not provide any capability to advertise what file has been tagged via OneDrive as 'on-demand' - thus there is zero mechanism to determine which file has this flag or does not.

Like what was mentioned on #105, no API needs to exist for the 'on-demand' flag. It is more of a state of the program than a per-file flag. As for the 'Always keep on this device' flag, these are stored and acted on locally. There is no need to access any API to know which files to always store a copy of.

  • ... After it has been downloaded like this, the file stays downloaded and can be viewed/modified as many times as you like up until "Free up space" is clicked

As this program is CLI based, to do this, your talking about a whole lot of GUI development to add this sort of functionality to a desktop manager (gnome/kde etc)

That's a fair critique. To avoid needing any GUI development. --free-up-space could be a switch for the onedrive command.

  • (This one could be a stretch goal for a later addition) When right clicking on a folder in OneDrive, there is an option listed to "Free up space". What this does is removes any files that were downloaded (with the exception of files/folders which have "Always keep on this device" checked off)

By using the OneDrive API, there is no concept of 'a device' - thus, no way to track that this 'instance' is the instance where the file must always be kept. As per above, there is no driveItem resource property that represents even this option.

When managing driveItem instances, would it be possible to know which files should always be kept locally by matching the pattern with the contents of a keep_local_list similar to the sync_list?

In short, the OneDrive API is missing this:

  • a driveItem property to handle 'on-demand'
  • a driveItem property to handle 'Always keep on this device'
  • The Windows OneDrive client integrates with Windows & the NTFS file system to provide some additional capabilities - which all the graphical desktop managers & Unix file systems currently do not with this client.

To summarize: Again, syncing files only on-demand is a property not of files or folders but of the local installation of OneDrive itself. Additionally, the Always keep on this device flag is something that is kept track of by the local installation of OneDrive. Since implementing the Always keep on this device flag through a GUI is impractical, would it be practical to instead use a keep_local_list similar to the sync_list? As for the integrations/hooks into Windows and the NTFS file system, I'm sure we could come up with similarly functioning workarounds.

abraunegg commented 4 years ago

@IncPlusPlus OK .. so lets say this 'works' - you have the CLI checking files on OneDrive, and, if say a client feature flag is set, it will not download the file, but create a zero byte file + all folder structures so that everything exists locally.

You open 'vi' or OpenOffice or whatever in your GUI, or CLI ... how does the GUI or application or CLI now tell the onedrive client to go download the file (as this is now on demand) ? What if you are 'offline' with no working Internet access?

There would have to be major integrations into a number of places to make this work / simulate the Microsoft client on Windows.

IncPlusPlus commented 4 years ago

@IncPlusPlus OK .. so lets say this 'works' - you have the CLI checking files on OneDrive, and, if say a client feature flag is set, it will not download the file, but create a zero byte file + all folder structures so that everything exists locally.

Correct. That's what the Windows client usually does.

You open 'vi' or OpenOffice or whatever in your GUI, or CLI ... how does the GUI or application or CLI now tell the onedrive client to go download the file (as this is now on demand) ?

That's the trouble... That's likely one of the NTFS/Windows hooks that OneDrive likely utilizes to perform this. I can help look into whether such a filesystem hook is possible with UNIX/Linux.

What if you are 'offline' with no working Internet access?

That's the purpose of the 'Always keep on this device' feature. It allows you to keep only the files you need around such that you have what you need when you're offline/on the go.

There would have to be major integrations into a number of places to make this work / simulate the Microsoft client on Windows.

Could you elaborate a bit?

abraunegg commented 4 years ago

@IncPlusPlus

Could you elaborate a bit?

I have not looked into it, but to recognise some sort of application FS access request would in my mind require some sort of kernel level FS hook.

Best for you to ask on the kernel developers mailing list for some pointers.

At the moment, I wont be looking into this any futher than what I already have .. back to beers & bbq & holiday for me.

IncPlusPlus commented 4 years ago

Thanks for hearing me out. Sounds like a plan. I'll look into this as I can and post updates here (if any). While I do research for this, would you mind keeping this issue open?

Happy holidays, Ryan

pdvrieze commented 4 years ago

@abraunegg The approach I would see here would be to use FUSE interface (probably through libfuse - https://github.com/libfuse/libfuse). In that context the most straightforward approach would be to approach things as a caching solution - not unlike what the Android client does. So in Linux you don't even need empty files at all. Obviously this could be enhanced by allowing "pinning" of files to keep them from being evicted from the cache.

abraunegg commented 4 years ago

@pdvrieze Thanks for the suggestion, however I already have a working prototype that I am ironing out some issues with - and this does not use FUSE, but empty files are used.

landall commented 4 years ago

@IncPlusPlus

Currently on-demand file downloading is not implemented

This is because the OneDrive API does not provide any capability to advertise what file has been tagged via OneDrive as 'on-demand' - thus there is zero mechanism to determine which file has this flag or does not.

For each OneDrive drive item the following is exposed: https://docs.microsoft.com/en-us/onedrive/developer/rest-api/resources/driveitem?view=odsp-graph-online

Until a driveItem property exists that represents that an item is 'on-demand', this cannot be supported by this client.

  • If a file is to be opened or otherwise used (such as copying or moving) by any program and it is not yet downloaded, it will be downloaded before it is used. It will be downloaded in-place (meaning it is not stored in a temporary location, to my knowledge). After it has been downloaded like this, the file stays downloaded and can be viewed/modified as many times as you like up until "Free up space" is clicked

As this program is CLI based, to do this, your talking about a whole lot of GUI development to add this sort of functionality to a desktop manager (gnome/kde etc)

  • (This one could be a stretch goal for a later addition) When right clicking on a folder in OneDrive, there is an option listed to "Free up space". What this does is removes any files that were downloaded (with the exception of files/folders which have "Always keep on this device" checked off)

By using the OneDrive API, there is no concept of 'a device' - thus, no way to track that this 'instance' is the instance where the file must always be kept. As per above, there is no driveItem resource property that represents even this option.

In short, the OneDrive API is missing this:

  • a driveItem property to handle 'on-demand'
  • a driveItem property to handle 'Always keep on this device'
  • The Windows OneDrive client integrates with Windows & the NTFS file system to provide some additional capabilities - which all the graphical desktop managers & Unix file systems currently do not with this client.

As per #105, it might be worth asking the question here: https://github.com/OneDrive/onedrive-api-docs to see if there is any plans to support the missing driveItem properties that would potentially allow some sort of development in this area. Without these properties being exposed, this is a non-starter.

file-on-demand is not a feature of OneDrive but a feature of Windows 10...

It create a lot of placeholder files in NT File System like local files. Then it runs in two systems:

  1. You should watch local OneDrive Folder to know a new-created file/folder. Then you should tranform it into a placeholder file and upload it.
  2. You should receive the file system op notification of the placeholders to know when you should download the content or the meta data, when you should move the file...

In Win10 1709, windows provide cfapi to make these things easier(but still not very easy)

https://docs.microsoft.com/en-us/windows/win32/cfapi/cloud-files-api-portal

So, Is there any way in Linux to do the same thing with NT File System? FUSE cannot do this thing directly. It is too slow to download a file to local, it need the OS and Apps to know how to wait for the download. At present, FUSE is more similar to ProjFS(ProjFS is designed for use with high-speed backing data stores.) https://docs.microsoft.com/en-us/windows/win32/projfs/projected-file-system

abraunegg commented 4 years ago

@landall I refer you to this comment please: https://github.com/abraunegg/onedrive/issues/757#issuecomment-578294917

landall commented 4 years ago

@landall I refer you to this comment please: #757 (comment)

I have a similar idea. Mount Onedrive both by WebDAV and local cache. Then create a lot of soft links to redirect the visit to local cache or network file. the job of File-On-Demand Module is switching a link from network file to local cache.

marcown commented 4 years ago

@landall I refer you to this comment please: #757 (comment)

Hey @abraunegg, keep up the great work! I am really looking forward to the on-demand option. Can you give some kind of time estimate? and is there anyway to support you?

abraunegg commented 4 years ago

@marcown The main way to support is when the feature drops as a PR - help test extensively.

Example: Business Shared Folders ... been a PR for a LONG time because not many people are providing feedback to assist with fixing any remaining issue.

landall commented 4 years ago

@marcown The main way to support is when the feature drops as a PR - help test extensively.

Example: Business Shared Folders ... been a PR for a LONG time because not many people are providing feedback to assist with fixing any remaining issue.

@marcown The main way to support is when the feature drops as a PR - help test extensively.

Example: Business Shared Folders ... been a PR for a LONG time because not many people are providing feedback to assist with fixing any remaining issue.

In some sense, there are not many people know D Language, this is a blocking to provide a PR.

pdvrieze commented 4 years ago

I'm seriously considering forking (and porting) the project/starting afresh in C++ with a UI as key element (the lack of interaction in the current system makes it hard to know whether the system is working). In my use case I need to support sharepoint with multiple teams, also used to store very large files (like videos) that I cannot download on each system. Running multiple instances of the client isn't really scalable. It would use fuse to provide access to the drive(s) and be backed probably by a file system based cache. That cache would also hold properties for availability which would be a per-device property.

abraunegg commented 4 years ago

@pdvrieze Thanks for your feedback, however there is UI integration .. perhaps read the docs on how to configure this?

abraunegg commented 4 years ago

@pdvrieze, @landall, @marcown, @IncPlusPlus

The following PR has been raised for this feature:

git clone https://github.com/abraunegg/onedrive.git
cd onedrive
git fetch origin pull/921/head:pr921
git checkout pr921
./configure; make clean; make;

WARNING: DO NOT USE IN PRODUCTION OR AGAINST DATA YOU CARE ABOUT

To activate this feature, edit your 'config' file and 'add':

on_demand = "true"

This will ONLY work / activate if you are using monitor mode (--monitor). This code is what I have been sitting on locally for a while, and unable to get to this state due to other commitments.

Known Issues:

  1. New files, as uploaded to OneDrive via webui, will be downloaded and replaced by a 23 byte file indicating that this is a 'on-demand' file. Existing local files that are in-sync 'should be' untouched
  2. If you 'cat' a file, the 'file on-demand file will download'. Cat the file again, and the right file is now available locally
  3. If you try and edit the 'on-demand' file, expecting the file to download before opening, this will fail & crash the application. This is due to ZERO file system capability under Linux to set a lock / delay file access whilst the file is being downloaded.
  4. There is ZERO tracking of any sort of 'i only want this type of file to be on-demand' - it is all new files or nothing.
  5. Unknown unknowns ...

Treat this PR as PRE-ALPHA quality. Use this as a base - and expand / assist to improve on. Do not use this against data you care about.

Just so that this is crystal clear - DO NOT USE THIS AGAINST DATA YOU CARE ABOUT

marcown commented 4 years ago

@abraunegg

Thanks, I will look into it.

pdvrieze commented 4 years ago

The way to handle the wait to download problem on Linux is to be a file system. The easiest way to do that is using fuse/libfuse. You can have the file system just pass through requests to the local file system that is used to back it, except of course for on-demand files. For those files it can even show the file size without actually using that much space (or being sparse)

As another feature, when syncing all on-demand files stored locally that are modified should just be reset to on-demand for a fast sync. Files requested to always be available locally can then be added to a background download queue.

landall commented 4 years ago

The way to handle the wait to download problem on Linux is to be a file system. The easiest way to do that is using fuse/libfuse. You can have the file system just pass through requests to the local file system that is used to back it, except of course for on-demand files. For those files it can even show the file size without actually using that much space (or being sparse)

As another feature, when syncing all on-demand files stored locally that are modified should just be reset to on-demand for a fast sync. Files requested to always be available locally can then be added to a background download queue.

It is a trouble that Onedrive is too slow indeed. Each request will take about 10-30 seconds.

Another problem is that current client don't listen to the sync notification of Onedrive. So the folder structure in local file system may be wrong.

abraunegg commented 4 years ago

@pdvrieze

The way to handle the wait to download problem on Linux is to be a file system. The easiest way to do that is using fuse/libfuse. You can have the file system just pass through requests to the local file system that is used to back it, except of course for on-demand files. For those files it can even show the file size without actually using that much space (or being sparse)

As another feature, when syncing all on-demand files stored locally that are modified should just be reset to on-demand for a fast sync. Files requested to always be available locally can then be added to a background download queue.

Feel free to learn and contribute

abraunegg commented 4 years ago

@landall

Another problem is that current client don't listen to the sync notification of Onedrive. So the folder structure in local file system may be wrong.

Whilst strictly speaking does not listen for events, the client every sync cycle it pulls in all the events since last delta - see the API regarding Tracking Changes:

The current client conforms with these practices

landall commented 4 years ago

@landall

Another problem is that current client don't listen to the sync notification of Onedrive. So the folder structure in local file system may be wrong.

Whilst strictly speaking does not listen for events, the client every sync cycle it pulls in all the events since last delta - see the API regarding Tracking Changes:

The current client conforms with these practices

Something can happen between your delta requests. This can cause some collisions in the server side of onedrive. I setup the webroot to the folder syncing by this software to fastsync my php code to AWS. It causes a lot of collisions.

listen to the api can reduce the chance of collisions. https://docs.microsoft.com/en-us/onedrive/developer/rest-api/concepts/using-webhooks?view=odsp-graph-online

abraunegg commented 4 years ago

@landall Did you ever report a problem? Dont think so ... sync issues are very common with the Skilion codebase .. not with this code base. If you are seeing an issue like that - then why are you not raising an issue ticket for it to be fixed ?

abraunegg commented 3 years ago

For folks looking to use the OneDrive files 'on-demand' capability in Linux, please investigate the following GitHub project: https://github.com/jstaf/onedriver

schmitch commented 3 years ago

@abraunegg you can use fanotify for on-demand files, thats basically what I would've been doing if I find time to create a client (unfortunatly I have no idea about the "d" language)

fanotify basically is like fnotify, except that you can block opening the syscall and thus firstly downloading the file. basically it works like that:

  1. create a fake file (0 byte file) (and track it as fake, in a database)
  2. user clicks on the file, fanotify will block it until it could've been downloaded

the only thing that you can't support (which windows does) is that windows also shows how big the files are on the remote storage, but that is a minor detail.

DanGough commented 3 years ago

For folks looking to use the OneDrive files 'on-demand' capability in Linux, please investigate the following GitHub project: https://github.com/jstaf/onedriver

Thanks for the link. I've been playing with RClone which works pretty well apart from issues resuming uploads after a reboot and such!