EricTheMagician / DriveFS

A google drive fuse filesystem implemented in C++
Mozilla Public License 2.0
63 stars 7 forks source link

Project goals #1

Open EricTheMagician opened 6 years ago

EricTheMagician commented 6 years ago

The goal of this project is to provide a quality fuse filesystem experience with google drive, that supports both upload and download, Team Drives, good support for home users witht assymetric internet connections, and most importantly, for it to bear the seal of approval from the Wife.

What currently works is a file upload / download ( writing / reading from GDrive )

Currently, the list of features that I plan on implementing (probably in order):

Long term goal (no particular order, but higher priority for rclone):

somerandom48 commented 6 years ago

This is awesome! I am currently still running your GDriveF4JS as my main mount for GDrive. In my opinion it is still the best available option of mounting GDrive on Linux.

I love the idea of native rclone encryption, but could I request you also look into native encfs support as that is the encryption method that I am using, and I know other do as well.

Thanks for all your work.

EricTheMagician commented 6 years ago

Good to know that people still use my node-gdrive project. I found it had way too many issues for me was was unstable, so this should be much more stable, and run much more smoothly.

For encfs, it should be much simpler. The current incarnation of drivvefs is based on my acdfs + encfs, but I never released because it was also unstable, but i had a bug in the encfs portion. the name encrding/decoding worked, but the reading/writing was slightly different and made it incompatible.

I'll probably just commit the encfs code as is at somepoint and if someone wants to look into it, a pull request would be more than appreciated. Actually, the code is lost. It would have been a good reference for both encfs and rclone. So I might do encfs first since that's already in c/c++, but man it was not fun to go through understanding the ssl library.

hjone72 commented 6 years ago

Really looking forward to this! I also found that your GDriveF4JS has been by far the best. Plus easiest for me to make changes to, after making a few small adjustments it is stable as a rock! My only problem with your old solution was the ls speeds.

I assume that is because of the FUSE libraries you had to use? Will this version be faster in that regard?

EricTheMagician commented 6 years ago

Thanks @hjone72!

Can you elaborate on the speed of ls? I haven't used it in a while.

I don't recall it being slow, so if you could give a more concrete example, I can compare on my current setup. I'm using rclone to decrypt and it's slows it down, but natively, it's quite fast. I'm also running it in debug mode under gdb.

Finally, since people are still using the node version, what's the write throughput that you guys are getting, from copying a file to node gdrive. On my test laptop, it's super fast since I have an SSD but on my VPS, since it's mechanical, and I'm running it on unraid, it's super slow, like 10mbytes/s.

On Fri, Jun 1, 2018, 10:10 PM hjone72 notifications@github.com wrote:

Really looking forward to this! I also found that your GDriveF4JS has been by far the best. Plus easiest for me to make changes to, after making a few small adjustments it is stable as a rock! My only problem with your old solution was the ls speeds.

I assume that is because of the FUSE libraries you had to use? Will this version be faster in that regard?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/thejinx0r/DriveFS/issues/1#issuecomment-394049280, or mute the thread https://github.com/notifications/unsubscribe-auth/AATvbBG5kePrKWSbtX6_cLqQiLIp7MRdks5t4fQSgaJpZM4UWC8m .

hjone72 commented 6 years ago

hmmm not sure how best to explain it. I have a fairly large GDrive with lots of files and folders (200,000+). When software that scans over those files it can take quite a significant amount of time. Or when running ls -R it can hault for various periods. This could also be to do with the nature of node (being single threaded). I guess the slowness mostly occurs when trying to perform many actions at once...

Download speed is the most important for me, and the download speed with the current node version has been fantastic! I don't use the upload feature so can't comment.

EricTheMagician commented 6 years ago

@hjone72 Ok I see. I don't have such a large library on GDrive. ~10k files with gdb attached for debugging, and code compiled in debug mode (not reldeb), it takes about 3 secs using find to find all the files, and ~0.5 seconds if I pipe the output to wc (word count).

In the past, I had ~100k files on ACD, and using a similar filesystem, I managed to get it close to ~5 seconds to go through all those files, with code compiled in release, and I think I had got it down to about 1 second when I stopped using smart pointers.

So I'll go right ahead and assume that when you say a significant amount of time, it's way more than 10 seconds. Maybe 10-15 minutes, if not more?

hjone72 commented 6 years ago

I have never actually let it finish, because it has taken so long. The very fast times you are talking about sound amazing! Happy to run tests for you on my larger drive when you get to that stage.

As I said before, the most important things for me is having a cache and high download speed.

Really looking forward to trying this out!

Gawthorne commented 6 years ago

I found that when using the high-level libfuse API within node, listing files is magnitudes faster than when using the low-level API. I put it to the fact that the high-level API only calls node a few times whereas the low-level API is constantly calling hooks within node.. However the high-level API being synchronous is terrible for reads/writes due to all the IO blocking... So I found neither really ideal. :(

Just my two cents.

EricTheMagician commented 6 years ago

It's not using node. It's pure C++, so there shouldn't be any performance problems with the lowlevel api here.

I wrote some build instructions on the wiki: https://github.com/thejinx0r/DriveFS/wiki/Compiling

I'm still testing it out. Since I'm mostly testing uploads right now, I'm not sure how it will behave once the cache is full while downloading.

It should clear it, but I haven't tested it yet. Just note that it will show a message that the cache is full and it will start deleting, though it will only update the size once it has cleared up ~10% of the cache. So if you download or upload while the cache is clearing, it will look like the cache is increasing, when it's really not.

Gawthorne commented 6 years ago

I was referring to the slow-downs hjone72 was facing when using your node-gdrive project.

But awesome! I attempted to get this to build a little while ago but couldn't find the special sauce to get it to compile. Thanks for the instructions.

On Wed., 4 Jul. 2018, 1:58 am Eric Yen, notifications@github.com wrote:

It's not using node. It's pure C++, so there shouldn't be any performance problems with the lowlevel api here.

I wrote some build instructions on the wiki: https://github.com/thejinx0r/DriveFS/wiki/Compiling

I'm still testing it out. Since I'm mostly testing uploads right now, I'm not sure how it will behave once the cache is full while downloading.

It should clear it, but I haven't tested it yet. Just note that it will show a message that the cache is full and it will start deleting, though it will only update the size once it has cleared up ~10% of the cache. So if you download or upload while the cache is clearing, it will look like the cache is increasing, when it's really not.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/thejinx0r/DriveFS/issues/1#issuecomment-402207407, or mute the thread https://github.com/notifications/unsubscribe-auth/ACCzZ8sFOnKHIy7Lypu9ftoL1YEdQVrHks5uC5SvgaJpZM4UWC8m .

hjone72 commented 6 years ago

Wondering if there is any chance you could post a sample config file?

EricTheMagician commented 6 years ago

@hjone72 https://github.com/thejinx0r/DriveFS/wiki/Sample-Config

BerriJ commented 6 years ago

Maybe you could add "Adding DriveFS to the AUR" as a long term goal?

mcadam commented 6 years ago

Hello, looking forward to use this project, I used your node version before switching to rclone a while back, but not the best regarding stability. I just had a quick question, I could not find it in the README or in the code, but do this mount refresh the list of dirs / files automatically if the some files were uploaded to the Gdrive from a different machine?

Thanks

EricTheMagician commented 6 years ago

Yes. It's every five minutes.

On Sat, Aug 25, 2018, 11:48 AM Adam Guldemann notifications@github.com wrote:

Hello, looking forward to use this project, I used your node version before switching to rclone a while back, but not the best regarding stability. I just had a quick question, I could not find it in the README or in the code, but do this mount refresh the list of dirs / files automatically if the some files were uploaded to the Gdrive from a different machine?

Thanks

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/thejinx0r/DriveFS/issues/1#issuecomment-415978277, or mute the thread https://github.com/notifications/unsubscribe-auth/AATvbHTV6qAdc_Uflx1SyqJs-tsKYlfiks5uUXHlgaJpZM4UWC8m .

mcadam commented 6 years ago

Hello, quick other question about design and implementation, I am currently running a cluster and mounting the gdrive on each node, and I was thinking could I use the same mongodb for all of them or would that maybe pose issues if the cache information is also in the database for example? Or maybe the way the updates works they can be out of sync or something maybe?

Thanks

EricTheMagician commented 6 years ago

@mcadam Its mostly the cache. I put in the db the location of the cached files and only check that since I was worried that scanning disks would be slow. You could just edit the code to have it store on a different table on each node.

Everything else should be ok since the values for the updating are stored in memory. Each node will do about the same work updating the database, but I don't think they should be out of sync, beyond the 5 minute refresh rate.

On Thu, Sep 6, 2018, 6:20 AM Adam Guldemann notifications@github.com wrote:

Hello, quick other question about design and implementation, I am currently running a cluster and mounting the gdrive on each node, and I was thinking could I use the same mongodb for all of them or would that maybe pose issues if the cache information is also in the database for example? Or maybe the way the updates works they can be out of sync or something maybe?

Thanks

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/thejinx0r/DriveFS/issues/1#issuecomment-419042545, or mute the thread https://github.com/notifications/unsubscribe-auth/AATvbPjcfbKvq-KOA43isX7bRLOFqNc2ks5uYPbUgaJpZM4UWC8m .

mcadam commented 6 years ago

Thanks for the quick answer :)

mcadam commented 6 years ago

I am trying to ship the executable without having to install everything, I am trying to statically link the libs but am getting some errors, don't usually work in c++ any help would be appreciated :)

Tried something like that cmake -DUSE_FUSE3=1 -DCMAKE_EXE_LINKER_FLAGS=-static .

and got those errors


[ 58%] Linking CXX executable DriveFS
/usr/sbin/ld: cannot find -lfuse3
/usr/sbin/ld: attempted static link of dynamic object `/usr/lib/libssl.so'
collect2: error: ld returned 1 exit status
make[2]: *** [CMakeFiles/DriveFS.dir/build.make:262: DriveFS] Error 1
make[1]: *** [CMakeFiles/Makefile2:106: CMakeFiles/DriveFS.dir/all] Error 2
make: *** [Makefile:84: all] Error 2```
EricTheMagician commented 6 years ago

@mcadam Do you have fuse3 installed or just fuse2? Most distribution only have fuse2 in general.

If you have it, it's not finding it for some reason. Open a new issue and post your "CMakeCache.txt" file from your build folder.

mcadam commented 6 years ago

I installed specifically fuse3, if I compile using this

cmake -DUSE_FUSE3=1 .
make -j 8

then its compiling fine, just I tried the option to statically link the libs so I can just move around the executable without reinstalling everything but didn't work when doing

cmake -DUSE_FUSE3=1 -DCMAKE_EXE_LINKER_FLAGS=-static .
make -j 8

I am pretty noob in C++ so pretty sure I am doing something wrong or its missing some things and is not that easy :)

But this can be done later on or added to your Todo list, for now I got a docker image to work, just it could be lighter but can work on that in the future.

mcadam commented 6 years ago

Hello, maybe, as one of the future goal, to speed up the first startup time to generate the drive tree, instead of getting changes from day one of the drive which for old drive can be quite long, you could on first startup create the list using files api to only get the files present on disk at that time, and then switch to changes and using first getStartPageToken to get the token to retrieve only from now on the new changes?

Still trying it out for now but looks like I will be adopting this one, thanks for sharing your work 👍