Azure / azure-storage-fuse

A virtual file system adapter for Azure Blob storage
Other
672 stars 208 forks source link

No space left on device #1560

Open sandip094 opened 1 week ago

sandip094 commented 1 week ago

Which version of blobfuse was used?

Which OS distribution and version are you using?

What was the issue encountered?

Getting the below error after running for few minutes on the RMAN backup released channel: C1 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of backup plus archivelog command at 11/07/2024 13:09:32 ORA-19502: write error on file "/rman-backup/step/1/2024-10-20_0115/STEP_1736_1_m839hd9q_20241107.incr1c", block number 91441152 (block size=8192) ORA-27061: waiting for async I/Os failed Linux-x86_64 Error: 28: No space left on device Additional information: 4294967295 Additional information: 1048576

Configuration file is as below -/etc/blobfuse/blobfuseconfig.yaml `logging: type: base level: log_info max-file-size-mb: 32 file-count: 10 track-time: true max-concurrency: 40 components:

libfuse: default-permission: 0644 attribute-expiration-sec: 120 entry-expiration-sec: 120 negative-entry-expiration-sec: 240 ignore-open-flags: true

file_cache: path: /mnt/blobfusetmp timeout-sec: 20 max-size-mb: 30720 allow-non-empty-temp: true cleanup-on-start: true

azstorage: type: block account-name: xxxxx account-key: xxxxx mode: key container: xxxxx`

Service file content -/etc/systemd/system/blobfuse2.service `[Unit] Description=A virtual file system adapter for Azure Blob storage. After=network-online.target Requires=network-online.target

[Service] User=oracle Group=dba Environment=BlobMountingPoint=/rman-backup Environment=BlobConfigFile=/etc/blobfuse/blobfuseconfig.yaml Environment=BlobCacheTmpPath=/mnt/blobfusetmp Environment=BlobLogPath=/var/log/blobfuse Type=forking ExecStart=/usr/bin/blobfuse2 mount ${BlobMountingPoint} --config-file=${BlobConfigFile} ExecStop=/usr/bin/blobfuse2 unmount ${BlobMountingPoint} ExecStartPre=+/usr/bin/install -d -o oracle -g dba ${BlobCacheTmpPath} ExecStartPre=+/usr/bin/install -d -o oracle -g dba ${BlobLogPath} ExecStartPre=+/usr/bin/install -d -o oracle -g dba ${BlobMountingPoint}

[Install] WantedBy=multi-user.target`

Backup files size is as follows: 28M control01.ctl 8.1G stepsysblob_step_1.dbf 743G stepsysdata_step_1.dbf 4.6G sysaux_step_1.dbf 801M system_step_1.dbf 20G temp_step_1.dbf 80G undo_t1_step_1.dbf 101M users_step_1.dbf

vibhansa-msft commented 3 days ago

"ORA-27061: waiting for async I/Os failed Linux-x86_64 Error: 28: No space left on device Additional information: 4294967295 Additional information: 1048576" : Kindly check the disk usage of "/mnt/blobfusetmp". Logs indicate the disk might be running out of space. I see you have kept 20 seconds as disk timeout and ~30GB disk space. If your application (RMAN in your case) generates more data than this limit in the given time frame the disk might just exhaust.

sandip094 commented 3 days ago

Hello @vibhansa-msft , I have this much of temp available. So what are your recommendation ? How does this calculation happens,to change these things "20 seconds as disk timeout and ~30GB disk space." Image

vibhansa-msft commented 2 days ago
timeout-sec: 20
max-size-mb: 30720

30GB space and 20 second timeout is something that you have configured in the .yaml file. If you have 600+ GB of disk space available you can increase the limit from 30GB to 100 may be and also reduce the timeout from 20 to 0 or 2 seconds. Timeout is useful only when your application reads the same file again and again. If process is going to read a file only once keeping the timeout to 0 saves the disk usage.

Also, Blobfuse deletes a file from local cache only if all open handles for the given file are closed. If your application does not close the handle then the file will remain in cache untill you mount. In such cases as well you will observe the disk is getting full. If you suspect this you can force a hard limit where your file open calls will start to fail if the disk is reaching configured capacity.

sandip094 commented 2 days ago

Hello @vibhansa-msft , Post changing the mentioned values backup still failed with no space error. Observations:

  1. /mnt becomes 100% in no time
  2. /rman-backup becomes 100G which it shouldnt be Image Image
vibhansa-msft commented 2 days ago

How big is the backup you are trying to take? 'df' command showing 100G in /rman-backup is not your container or data upload size. It just shows the configured size for your disk for temp cache and its usage. As per this your temp cache is 100% full which means either the files are not being closed by RMAN or it's generating too much of data in a short span of time. Can you enabel debug logs and share the log file with us, it will be easier that way to rule out possibility of not closing the file part.

sandip094 commented 2 days ago

For some reason the debug log file is not getting generated [root@asose2e798c623453573167ad8162-db-1 bin]# cd /var/log/blobfuse/ [root@asose2e798c6273167ad8162-db-1 blobfuse]# ls -ltr total 0 [root@asose2e798c623453573167ad8162-db-1 blobfuse]# cat /etc/blobfuse/blobfuseconfig.yaml | grep level level: LOG_DEBUG

vibhansa-msft commented 1 day ago

If you have syslog filters installed it shall be in '/var/log/blobfuse2.log' file, otherwise by default it will go to '/var/log/messages'. If you are using AKS then logs might be directed to the pod directory created on the node.