Azure / azure-storage-fuse

A virtual file system adapter for Azure Blob storage
Other
657 stars 206 forks source link

Lacking performances in upload #127

Closed fedexist closed 6 years ago

fedexist commented 6 years ago

I've been trying out Blob Fuse since i'd like to use it as a mean to transfer big files from a linux server to Azure without azcopy.

From my tests, using a 5 GB file, I can't go over 30 MB/s in upload and I'm beginning to think there's something wrong.

The following are the blobfuse values I've tried to change in order to improve performances and reach a higher throughput:

const int defaultMaxConnections=20 //Changed to 50-200-1000 without any actual change

//az_init
conn->max_background=128 //Changed to 256

Maybe I'm missing something, is there any way to increase the number of concurrent threads used in upload? Just like AzCopy has?

seguler commented 6 years ago

I just tested on a DS8 VM and saw a throughput of 104MB/s on 5GB random file. This is without any modifications. For the download, I am seeing about 300-400MB/s throughput.

Is your VM and storage account located in the same region ? What is your underlying disk ? Can you switch to ramdisk instead ?

We do our best to upload a file concurrently in blobfuse, however we do the upload only after a close is called. During write operations, all calls go to the underlying cache disk you chose when you mounted blobfuse. From my tests, it seems the duration of this write to the cache disk is the main bottleneck.

seguler commented 6 years ago

I do think your bottleneck could be distance to Storage account, your VM size or the underlying cache disk speed.

fedexist commented 6 years ago

Hi @seguler , our test environment is an on-premise VM (4 cores, 16 GBs) running on a HyperV cluster with autotiering storage IBM V7000, since our use case would be the file transfer from on-premise to a specific storage account.

Beforehand, I tried to use AzCopy from the same machine reaching an average of 65 MB/s in upload speed, so I expected blobfuse to reach around the same performances and the distance from the Storage Account not to be a really impactful factor.

I'll be trying to use a ramdisk as cache and will report back. Thank you!

fedexist commented 6 years ago

So, I tried using a ramdisk of 10G to upload the same 5G file:

mount -t tmpfs -o size=10G tmpfs /mnt/fusecache-tmpfs

# mount.sh
#!/bin/bash
BLOBFS_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
cd $BLOBFS_DIR/build
./blobfuse $1 --tmp-path=/mnt/fusecache-tmpfs -o attr_timeout=240 -o entry_timeout=240 -o negative_timeout=120 --config-file=/root/connection.cfg

but there's no difference, as far as I can see, with respect to the normal disk that was used before.

seguler commented 6 years ago

Thanks for the details. I must say in the upload performance we have problems. I also see similar behavior on my test environment where the upload performance is about half of what I normally get with AzCopy. This is because of the blobfuse implementation - in order to handle random IO; we are buffering the entire file on disk before we upload to Blob storage. Nothing can be done in the short term for this.

seguler commented 6 years ago

@fedexist Just fyi, we now provide a tool that does multithreaded copy through blobfuse. It is called blobcp: https://github.com/Azure/azure-storage-fuse/tree/master/tools This improves the throughput between 4-10X in my experience.

Blobcp gets installed with the blobfuse installation. You can run it as such:

blobcp /mylocaldirectory /myblobdirectory

fedexist commented 6 years ago

That's great! Thank you @seguler.

seguler commented 6 years ago

Closing this as the helper tool we published mitigates the problem.