Azure / azure-storage-fuse

A virtual file system adapter for Azure Blob storage
Other
660 stars 206 forks source link

I want to display the latest BLOB data using streaming method. #1390

Closed Shunya-Seki closed 6 months ago

Shunya-Seki commented 6 months ago

Which version of blobfuse was used?

Blobfuse2 version: 2.3.0~preview.1

Which OS distribution and version are you using?

Rhel8.8

If relevant, please share your mount command.

◆mount command blobfuse2 mount /blobmount --config-file=/etc/blobfuse2config.yaml -o allow_other

◆config.yaml

Refer ./setup/baseConfig.yaml for full set of config parameters

logging: type: syslog level: log_debug

components:

libfuse: attribute-expiration-sec: 0 entry-expiration-sec: 0 negative-entry-expiration-sec: 0 direct-io: true

stream: block-size-mb: 0 max-buffers: 0 buffer-size-mb: 0

attr_cache: timeout-sec: 7200

azstorage: type: block account-name: xxxxx objid: xxxxxxxxx endpoint: xxxxxx mode: msi container: xxxx

What was the issue encountered?

I want to use the Stream method and ensure that the data in Blob Storage is always up to date on the OS side where the mount is performed. Is the above configuration okay? I want to confirm just in case.

Have you found a mitigation/solution?

The configuration seems to be working fine, and the latest BLOB data is being displayed without any issues.

Please share logs if available.

vibhansa-msft commented 6 months ago

If you want to refresh the contents locally as and when they are updated on the container then this configuration will not work. What you need here is to use '-o direct_io' cli parameter. 'streamis not a stable component so you can migrate toblock-cache` instead. Sample command and config below :

blobfuse2 mount /blobmount --config-file=/etc/blobfuse2config.yaml -o allow_other -o direct_io
logging:
  type: syslog
  level: log_debug

components:
  libfuse
  block_cache
  attr_cache
  azstorage

libfuse:
  attribute-expiration-sec: 0
  entry-expiration-sec: 0
  negative-entry-expiration-sec: 0

block_cache:
  block-size-mb: 8
  mem-size-mb: 2048
  prefetch: 12
  parallelism: 64

attr_cache:
  timeout-sec: 7200

azstorage:
  account-name: xxxxx
  objid: xxxxxxxxx
  mode: msi
  container: xxxx
vibhansa-msft commented 6 months ago

I see you are using objid for MSI based authentication. It's adivsed to change to appid based authentication as objid based is not natively supported and needs azcli as well to be installed. If you are using Azure VM then you can assign the identity to the VM itself and then skip providing any appid/objid here in the config file.

vibhansa-msft commented 6 months ago

Closing this as there is no action item on blobfuse here. Feel free to post your questions/queries here.

Shunya-Seki commented 6 months ago

Thank you for the information. I've implemented the provided config and mount, but it's not updating. I'm checking the content with the following steps: ①Configuration settings (Received config file)

②Mounting blobfuse2 mount /blobmount --config-file=/etc/blobfuse2config.yaml -o allow_other -o direct_io

③Confirming the content with the following command cat /blobmount/contents

④Updating the content (From Azure Portal)

⑤Reconfirming the content with the following command cat /blobmount/contents

Additionally, I was able to mount it without any issues, skipping the "objid". (I'm using Azure VM)

vibhansa-msft commented 6 months ago

Remove "attr_cache" from "components" section in your config file and remount. As you have enabled log debug you can check the logs when you issue cat command for the second time. You shall receive a file open call for that and some downloads shall happen. If thats not happening then you can share the log files with us.

Shunya-Seki commented 6 months ago

I removed 'attr_cache' from the config and remounted. Now the latest content is being displayed. Thank you for your assistance. Could you provide additional information? Would it be okay for all settings of 'block_cache' to be set to '0' when displaying always-updated content, as in this question?

block-size-mb:
mem-size-mb:
prefetch:
parallelism:

vibhansa-msft commented 6 months ago

No in block-cache model you can not set all these parameters to 0 as that means you do not have any memory allocated to hold the incoming data. Based on your available memory and average file size you can tune these parameters.

Shunya-Seki commented 6 months ago

Thank you. The memory of the Azure VM and the average of the read files are as follows. Are there any recommended values for these parameters in this case?

The memory of the Azure VM:32GiB average of the read files(read only):0.8GB

◆Parameters of Block_cache block-size-mb: mem-size-mb: prefetch: parallelism:

Is there a workload to determine the parameters for "block-cache"?

vibhansa-msft commented 5 months ago

keep "block-size-mb" to 16 and based on avilable memory space you can allocate "mem-size-mb". I see you have 32GB memory in your VM so you can put 20GB for this value (if you are mounting only one instance of blobfuse and there are no other memory hungry applications running on same node). "prefetch" you can set to 50 as avg file size is not too huge. For "parallelism" you can set it to 50 as well.