TurboGit / hubicfuse

Support for mounting HubiC drive in GNU/Linux
MIT License
328 stars 55 forks source link

Segmented files are not handled properly #54

Closed maki-chan closed 8 years ago

maki-chan commented 9 years ago

Segmented files (files with a size greater than segment_above) are not handled properly by hubicfuse. I did not yet try to copy a file from hubiC to my local hard drive (so download it), but using ls -sh does show those file as 0 bytes in size (maybe because they seem like a link to the segmented files?).

maki-chan commented 9 years ago

Well, they are not handled properly when showing their size, but it is copied completely via cp for example and it is also downloadable from the WebUI of hubiC.

When uploading such a huge file, a folder is created which is named like the folder it is in, so for example: A file in /hubic_root/default/folder1/folder2/hugeFileWith6G is a link to the merged segments in /hubic_root/default/folder1/folder2_segments/hugeFileWith6G/TIMESTAMP/FILE_SIZE_IN_BYTES/SEGMENT_SIZE/

The segments are numbered numerical like 00000000, 00000001, ...

Small legend to the folder path above: TIMESTAMP = Unix Timestamp in microseconds with a decimal point between seconds and parts of a second (like PHP microtime(true)) FILE_SIZE_IN_BYTES = The file size in bytes of the complete unseparated file which was uploaded SEGMENT_SIZE = Size of one segment in bytes (the last one obviously can be smaller than that)

Maybe retrieving the size of a file has to be handled differently then, or should we just leave them to show as 0-byte files?

TurboGit commented 9 years ago

Would be nice to be able to display them properly. But I do not have such big file uploaded on my side (nor my connection speed will allow this to be done in acceptable time). So please by all mean if you can contribute this it would be really good.

maki-chan commented 9 years ago

Unfortunately, I am not really into C, C++ or the FUSE file systems. I hope somebody else can use the info I provided to contribute a patch.

romanrm commented 9 years ago

One workaround is to raise the segment_above value, from my tests it accepted up to about 4.5 GB sized file without segments (did not try any larger). However uploading such large files often fails, so I just avoid uploading files larger than 2-3 GB now.

maki-chan commented 9 years ago

My segment_above value is set to 4GB and works great. But if I want to upload a 6.4GB file, it has to be splitted. You cannot go over 5GB as hubiC does not allow files that exceed 5GB, that's why I use 4GB in segment_above (1GB less for safety reasons).

Issam2204 commented 9 years ago

Forgive my ignorance but why then hubiC states that there's not limit in file size? Is it a shortcoming of the fuse file system?

TurboGit commented 9 years ago

Probably so yes and also to the connection stability maybe, to upload some 5Gb here I need 14 hours or so.

Issam2204 commented 9 years ago

Thank you for your answer TurboGit!

maki-chan commented 9 years ago

Yes, they changed it since the Beta (I was in the hubiC Beta where there was a file size limit of 5GB). Meanwhile, they changed it:

No, there's no size limit for files deposited on hubiC, except, of course, the amount of free space available on your hubiC account. However, the size of a file deposited via the web application cannot exceed 1GB (soon to be 4GB) and/or 10,000 different files. This limitation is actually inherent in the web browser capacities.

rejoc commented 9 years ago

@TurboGit : To be able to fix this issue and upload big files, I can provide a small ovh/kimsufi for some time with some big files on it. There should not be bandwidth issue within ovh datacenters.

rejoc commented 9 years ago

I made some uploads with hubic backup utility. Some files were over 10GB. When I look at them through hubicfuse, the size and creation dates are displayed correctly with the acutual values.

With large files (segmented) uploaded through hubicfuse, the size is 0 and the date is wrong (time is wrong, earlier than the actual transfer time in all the tests I made). Also, when you look at those files through the hubic web interface, there is no size displayed.

rejoc commented 9 years ago

One main difference I can see in the headers of files uploded via hubicfuse and via hubic backup is that in the first case you use a "dynamic large object manifest object" and hubic uses a "static large object manifest object". This may be why the size is 0 when you get the container items list.

For the dates, it my be a GMT/Local time conversion issue.

TurboGit commented 9 years ago

@rejoc, thanks for the offer but at the moment I have really no time to look at this. Too many things to do... Help would be appreciated. I'm still the only maintainer for hubicfuse :(

GreenKudu commented 9 years ago

I'm curious abut the files uploaded using the 'hubic backup' utility. I'm just uploading a batch of huge files myself and through hubic fuse I see the files with normal dates and sizes and then I see a whole bunch of file segments in a parallel 'dirname_segments' directory.

Is it safe to manipulate such huge files through hubic fuse? I'm particularly interested in being able to safely delete them through hubic fuse.

TurboGit commented 9 years ago

Cannot comment as I do not handle large file myself, but users have reported here good success.

rejoc commented 9 years ago

I did not have any problem (as far as you don't try to manipulate the files in 'dirname_segments' directly). You'll find the same ..._segments directory for the originals hubic folders. It is the way the storage is actually done in hubic. Large files are splitted in small chunks and a large file is stored as a collection of smaller chunks.

hubicfuse could perhaps hide those ..._segments directories to prevent errors.

GreenKudu commented 9 years ago

Thanks for the feedback! I'll experiment with this, I could imagine that manipulating the 'dirname_segments' directory was not a good idea.

I would also suggest to hide those directories if manipulating them directly is a big no-no.

kurisuD commented 9 years ago

Hello all,

I think those _segments folders should all be stored in default_segments folder as seen from hubicfuse mount point. It would be hidden from the web UI and also would probably solve issue 41 and part of 68.

The change shouldn't be difficult to do. Just need to make sure the path to segments are properly read from existing manifests.

On the large size files , I've had myself problems under x86 (compared to x86_64)

I've a patch for this (and other things) available, but did not make a pull request yet to Pascal, as I wanted to test it more and Pascal did not have much time as well to review it. So far, I've had no (functional or technical) issues, so I'm confident for you to have a look. Some feedback would be more than welcomed before doing the pull request.

You can pull from https://github.com/kurisuD/hubicfuse

Have a good one !

On 13/07/2015 19:22, Albert Claret wrote:

Thanks for the feedback! I'll experiment with this, I could imagine that manipulating the 'dirname_segments' directory was not a good idea.

I would also suggest to hide those directories if manipulating them directly is a big no-no.

— Reply to this email directly or view it on GitHub https://github.com/TurboGit/hubicfuse/issues/54#issuecomment-120885464.

dan-cristian commented 8 years ago

I have implemented a fix to resolve the 0 size segmented file issue, including several speed & reliability improvements, you can find it here: https://github.com/dan-cristian/hubicfuse.

thias commented 8 years ago

@dan-cristian : I just had a quick look at your fork... eeeek! You should really re-implement your changes with a limited number of clean, clear and relevant commits, then open a pull request here :-)

romanrm commented 8 years ago

@dan-cristian does your version add proper support for files larger than 5 GB (i.e. segmented)?

dan-cristian commented 8 years ago

Sure, will do. This is my first forked work/merge request on github, so please be kind ... :)

From: Matthias Saou [mailto:notifications@github.com] Sent: Tuesday, November 24, 2015 23:59 To: TurboGit/hubicfuse hubicfuse@noreply.github.com Cc: Dan Cristian dan.cristian@gmail.com Subject: Re: [hubicfuse] Segmented files are not handled properly (#54)

@dan-cristian https://github.com/dan-cristian : I just had a quick look at your fork... eeeek! You should really re-implement your changes with a limited number of clean, clear and relevant commits, then open a pull request here :-)

— Reply to this email directly or view it on GitHub https://github.com/TurboGit/hubicfuse/issues/54#issuecomment-159417644 . https://github.com/notifications/beacon/AKO6921B-uUFPba25lf54k_ZFxVr-CORks5pJNUlgaJpZM4D2Kbd.gif

dan-cristian commented 8 years ago

I’ll test and confirm, I think it should support that.

From: romanrm [mailto:notifications@github.com] Sent: Wednesday, November 25, 2015 11:36 To: TurboGit/hubicfuse hubicfuse@noreply.github.com Cc: Dan Cristian dan.cristian@gmail.com Subject: Re: [hubicfuse] Segmented files are not handled properly (#54)

@dan-cristian https://github.com/dan-cristian does your version add proper support for files larger than 5 GB (i.e. segmented)?

— Reply to this email directly or view it on GitHub https://github.com/TurboGit/hubicfuse/issues/54#issuecomment-159549793 . https://github.com/notifications/beacon/AKO698S38T4oC-fQl3itWt-4xBkgAlJlks5pJXhigaJpZM4D2Kbd.gif

rejoc commented 8 years ago

Yes it does. I have files larger than 25G and the value in bytes is correct now when you "ls -l" the hubic directory.

kurisuD commented 8 years ago

Large file support is broken when running on x86. I have a fork for dealing with this and few other little things, but it's still under testing. I'll try to branch my fork for a version including Christian's exciting changes, but it won't happen until mid-december, too busy at day work unfortunately.

TurboGit commented 8 years ago

Closing has this is now fixed.

@kurisuD, please open a new ticket for x86 or just a merge request when done.