Open olegshv opened 1 month ago
How big is the file that you are trying to upload (.bak file which failed) ? Azure storage backend supports a max of 50K blocks per blob while each individual block can range from few bytes to 4GB in size. As per you config you have set block-size to 64MB. You can do the math here, if the file size divided by 64MB goes beyond 50K then its bound to fail and only workaround here is to update the block size to a point where your file fits in 50K blocks.
Also, you have enabled disk-based caching of blocks. Do you really need this as this is going to consume space on your local disk. If your workflow is to upload the data and then read it back, then you shall continue having this but if job is to only upload the data, then you can get rid of disk-based caching completely.
How big is the file that you are trying to upload (.bak file which failed) ? Azure storage backend supports a max of 50K blocks per blob while each individual block can range from few bytes to 4GB in size. As per you config you have set block-size to 64MB. You can do the math here, if the file size divided by 64MB goes beyond 50K then its bound to fail and only workaround here is to update the block size to a point where your file fits in 50K blocks.
Hello! Thank you for your reply. This .bak file is about 3.6 TB, but I have already transferred 5 files of 3.6 TB before and it was successful. Did I understand correctly that here I need to change the block-size to more than 64mb?
Regarding the second question, our process is that we make database backups directly to the storage via blogger and restore them from there. We are still testing the best and fastest way to do this. Therefore, I would be very grateful for your advice on how best to proceed in this case. Let me remind you that files can be larger than 4 TB.
With 64MB block size you can upload at max 3.05TB file only. If you file sizes go beyond this then you need to bump up the block-size. You can try with 128MB which allows you to go beyond 4TB. If backup and restore are done on the same box then disk cache will help. But considering the size of your files, I beleive it's best to disable the disk cache completly (assuming you do not have 4TB free disk space to store all of the contents).
With 64MB block size you can upload at max 3.05TB file only. If you file sizes go beyond this then you need to bump up the block-size. You can try with 128MB which allows you to go beyond 4TB. If backup and restore are done on the same box then disk cache will help. But considering the size of your files, I beleive it's best to disable the disk cache completly (assuming you do not have 4TB free disk space to store all of the contents).
Understood, thank you! Unfortunately, the tests are bad with 128 MB, but I found the article ‘About block blobs’ and now I understand that there is still room for improvement. I will test it with 256+ block size and report back with the results, maybe someone from our community will be interested.
Could you please clarify the following questions?
Did the above config changes help in uploading the file?
Hello, Vikas! During these days, I conducted many experiments and found the stable configuration for our workloads.
Here is my configuration (it is used for parallel 15-18 Write/Read operations with files from 250Mb to 2.7Tb):
block_cache:
block-size-mb: 256
mem-size-mb: 56320
prefetch: 20
parallelism: 128
Here is a graph to visualise the test with these settings
I also ran into a strange problem, my jobs have fallen with an error: The block list may not contain more than 50,000 blocks. This is even though, according to the formula "block-size-mb * 50 000" I didn't even come close to the limit. I would like to ask, after restarting the service and starting all the backup jobs again, could that list continue to be filled and reach the 50,000 blocks limit (or maybe all the lists remained in RAM and just continued to be recorded + new ones were created, or if the file name the same the list will continue to be filled)? Please help me to clarify the reason for this.
Without going off topic, I would like to ask you if I understand architecture correctly.
If the structure I described above is correct, then the question arises again how we get the error "The block list may not contain more than 50,000 blocks." when transferring a file that is not even more than 64Mb * 50 0000 = 3,2Tb
There is also a separate, very important question. When scaling horizontally (adding a more VMs with Blobfuse2), can we hit any limits on data transfer to the same Storage Account? From what I found in the article Storage account limits, there are default maximum ingress/egress traffic (60 Gbps/120 Gbps) to the Storage Account. At first glance, I don't see any more critical restrictions. I would be truly grateful for any additional information.
I apologize for the possibly obvious questions, but it's critical for me to understand this topic for proper use. Thank you in advance!
Thank you for your answers! Today, I set up blobfuse2 on one of the productive servers. This time, the behavior of the blobfuse2 was interesting, errors occurred, and the network share continued to work. After a while, it fell off. Please tell me what could be the reason and how to fix it? Blobfuse logs:
Wed Jun 12 22:54:51 EEST 2024 : blobfuse2[1215] : [/opt/blobfuse/fuse_container] LOG_ERR [azstorage.go (458)]: AzStorage::ReadInBuffer : Failed to read storage/backup/site1-Dump.backup.backup [numerical result out of range]
Wed Jun 12 22:54:51 EEST 2024 : blobfuse2[1215] : [/opt/blobfuse/fuse_container] LOG_ERR [block_cache.go (847)]: BlockCache::download : 3 attempts to download a block have failed 378=>storage/backup/site1-Dump.backup.backup (index 0, offset 0)
Wed Jun 12 22:54:51 EEST 2024 : blobfuse2[1215] : [/opt/blobfuse/fuse_container] LOG_ERR [block_cache.go (580)]: BlockCache::getBlock : Failed to download block 378=>storage/backup/site1-Dump.backup.backup (offset 0, index 0)
Wed Jun 12 22:54:51 EEST 2024 : blobfuse2[1215] : [/opt/blobfuse/fuse_container] LOG_ERR [block_cache.go (493)]: BlockCache::ReadInBuffer : Failed to get Block 378=>storage/backup/site1-Dump.backup.backup offset 0 [failed to download block]
Wed Jun 12 22:54:51 EEST 2024 : blobfuse2[1215] : [/opt/blobfuse/fuse_container] LOG_ERR [libfuse_handler.go (764)]: Libfuse::libfuse_read : error reading file storage/backup/site1-Dump.backup.backup, handle: 378 [failed to download block]
Wed Jun 12 22:56:36 EEST 2024 : blobfuse2[1215] : [/opt/blobfuse/fuse_container] LOG_ERR [block_blob.go (1462)]: BlockBlob::CommitBlocks : Failed to commit block list to blob storage/backup/mssql-site-diff.bak [PUT https://backupStorage.blob.core.windows.net/storage/backup/mssql-site-diff.bak
--------------------------------------------------------------------------------
RESPONSE 400: 400 One of the request inputs is not valid.
ERROR CODE: InvalidInput
--------------------------------------------------------------------------------
<?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidInput</Code><Message>One of the request inputs is not valid.
RequestId:e9f99714-001e-0050-3502-bd194d000000
Time:2024-06-12T19:56:36.1653073Z</Message></Error>
--------------------------------------------------------------------------------
Wed Jun 12 22:56:55 EEST 2024 : blobfuse2[1215] : [/opt/blobfuse/fuse_container] LOG_ERR [block_blob.go (1462)]: BlockBlob::CommitBlocks : Failed to commit block list to blob storage/backup/mssql-site-diff.bak [PUT https://backupStorage.blob.core.windows.net/storage/backup/mssql-site-diff.bak
--------------------------------------------------------------------------------
RESPONSE 400: 400 One of the request inputs is not valid.
ERROR CODE: InvalidInput
--------------------------------------------------------------------------------
<?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidInput</Code><Message>One of the request inputs is not valid.
RequestId:c077b8a3-001e-0032-5902-bddb6a000000
Time:2024-06-12T19:56:55.9598214Z</Message></Error>
--------------------------------------------------------------------------------
Wed Jun 12 22:57:19 EEST 2024 : blobfuse2[1215] : [/opt/blobfuse/fuse_container] LOG_ERR [block_cache.go (439)]: BlockCache::CloseFile : failed to flush file storage/backup/mssql-site-diff.bak
Wed Jun 12 22:57:19 EEST 2024 : blobfuse2[1215] : [/opt/blobfuse/fuse_container] LOG_ERR [libfuse_handler.go (867)]: Libfuse::libfuse_release : error closing file storage/backup/mssql-site-diff.bak, handle: 2036 [PUT https://backupStorage.blob.core.windows.net/storage/backup/mssql-site-diff.bak
--------------------------------------------------------------------------------
RESPONSE 400: 400 One of the request inputs is not valid.
ERROR CODE: InvalidInput
--------------------------------------------------------------------------------
<?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidInput</Code><Message>One of the request inputs is not valid.
RequestId:bf187da8-301e-0074-4b02-bdefed000000
Time:2024-06-12T19:57:19.7806094Z</Message></Error>
--------------------------------------------------------------------------------
]
Wed Jun 12 22:57:31 EEST 2024 : blobfuse2[1215] : [/opt/blobfuse/fuse_container] LOG_ERR [block_blob.go (1462)]: BlockBlob::CommitBlocks : Failed to commit block list to blob Cstorage/backup/mssql-site2-diff.bak [PUT https://backupStorage.blob.core.windows.net/storage/backup/mssql-site2-diff.bak
--------------------------------------------------------------------------------
]
Wed Jun 12 22:57:31 EEST 2024 : blobfuse2[1215] : [/opt/blobfuse/fuse_container] LOG_ERR [block_cache.go (409)]: BlockCache::FlushFile : Failed to commit blocks for storage/backup/mssql-site2-diff.bak [PUT https://backupStorage.blob.core.windows.net/storage/backup/mssql-site2-diff.bak
--------------------------------------------------------------------------------
RESPONSE 400: 400 One of the request inputs is not valid.
ERROR CODE: InvalidInput
--------------------------------------------------------------------------------
]
Wed Jun 12 22:57:31 EEST 2024 : blobfuse2[1215] : [/opt/blobfuse/fuse_container] LOG_ERR [block_cache.go (1386)]: BlockCache::SyncFile : failed to flush file storage/backup/mssql-site2-diff.bak
Wed Jun 12 22:57:31 EEST 2024 : blobfuse2[1215] : [/opt/blobfuse/fuse_container] LOG_ERR [libfuse_handler.go (1062)]: Libfuse::libfuse_fsync : error syncing file storage/backup/mssql-site2-diff.bak [PUT https://backupStorage.blob.core.windows.net/storage/backup/mssql-site2-diff.bak
--------------------------------------------------------------------------------
]
Wed Jun 12 22:57:54 EEST 2024 : blobfuse2[1215] : [/opt/blobfuse/fuse_container] LOG_ERR [libfuse_handler.go (815)]: Libfuse::libfuse_flush : error flushing file /storage/backup/mssql-site2-diff.bak, handle: 2066 [PUT https://backupStorage.blob.core.windows.net/storage/backup/mssql-site2-diff.bak
--------------------------------------------------------------------------------
RESPONSE 400: 400 One of the request inputs is not valid.
ERROR CODE: InvalidInput
--------------------------------------------------------------------------------
]
Wed Jun 12 23:00:12 EEST 2024 : blobfuse2[1215] : [/opt/blobfuse/fuse_container] LOG_ERR [block_cache.go (1229)]: BlockCache::commitBlocks : Failed to commit blocks for storage/backup/mssql-site2-diff.bak [PUT https://backupStorage.blob.core.windows.net/storage/backup/mssql-site2-diff.bak
--------------------------------------------------------------------------------
TeamCity logs when it happand:
MSSQL error: The operating system returned the error '1117(The request could not be performed because of an I/O device error.)' while attempting 'FlushFileBuffers' on '\\my-blobfuse\backup\mssql-site2-diff.bak'.
22:56:55 BACKUP DATABASE is terminating abnormally.
23:06:27 pg_dump: error: could not write to output file: Input/output error
23:06:31 Error message: pg_dump: error: could not write to output file: Input/output error
My config (VM 16CPU 128RAM):
allow-other: true
logging:
type: base
level: log_warning
file-path: '/var/log/blobfuse2/blobfuse2.log'
components:
- libfuse
- block_cache
- attr_cache
- azstorage
libfuse:
default-permission: 0777
attribute-expiration-sec: 240
entry-expiration-sec: 240
negative-entry-expiration-sec: 240
attr_cache:
timeout-sec: 3600
azstorage:
type: adls
account-name: *****
account-key: *****
mode: key
container: ******
tier: cool
block_cache:
block-size-mb: 256
mem-size-mb: 112640
prefetch: 20
parallelism: 128
I'm not sure, but it might be useful to see the load graphs. After a failed backup, I interrupted the process and repeated with different configurations.
UPD I have added a new debug log, maybe it can help blobfuse2-new.zip
Hello! I've spent all these days testing the problem. Unfortunately, the issues do not disappear with different configurations. To better structure everything, I wrote down the debug logs in the "minimal" configuration in the hope that this will help you solve the problem.
In this case, three cases with 64, 128 and 256 block-size-mb were recorded. We ran 8 database backups at the same time. With block-size-mb 64 and 128, the network share crashed. With 256 block-size-mb the network share was working, but the backup of MSSQL databases was down.
The operating system returned the error '1117(The request could not be performed because of an I/O device error.)' while attempting 'FlushFileBuffers' on '\\mystorage\mystorage\Client\BackupFolder\site2\site2-db-20240617-175314-812-full.bak'.
BACKUP DATABASE is terminating abnormally.
Sizes of database backups:
site1 2.3Tb
site2 166Gb
site3 1.3Gb
site4 24Gb
site5 642Mb
site6 246Gb
site7 374Mb
site8 236Mb
site9 253Gb
My config (Standard E8d v5 (8 vcpus, 64 GiB memory)):
logging:
type: base
level: log_debug
file-path: '/var/log/blobfuse2/blobfuse2.log'
allow-other: true
azstorage:
type: adls
account-name: ******
account-key: ******
mode: key
container: ******
tier: cool
max-retries: 10
components:
- libfuse
- block_cache
- attr_cache
- azstorage
health_monitor:
enable-monitoring: true
stats-poll-interval-sec: 30
output-path: /var/log/blobfuse2
monitor-disable-list:
- memory_profiler
- cpu_profiler
- file_cache_monitor
libfuse:
attribute-expiration-sec: 240
entry-expiration-sec: 240
negative-entry-expiration-sec: 240
..
block_cache:
block-size-mb:. 64 ... 128 ... 256
mem-size-mb: 56320
prefetch: 12
attr_cache:
timeout-sec: 7200
I am attaching the logs of three scenarios. If you need to test other scenarios, we are ready to help in any way we can. Thank you!
Wed Jun 12 22:57:54 EEST 2024 : blobfuse2[1215] : [/opt/blobfuse/fuse_container] LOG_ERR [libfuse_handler.go (815)]: Libfuse::libfuse_flush : error flushing file /storage/backup/mssql-site2-diff.bak, handle: 2066 [PUT https://backupStorage.blob.core.windows.net/storage/backup/mssql-site2-diff.bak
--------------------------------------------------------------------------------
RESPONSE 400: 400 One of the request inputs is not valid.
ERROR CODE: InvalidInput
--------------------------------------------------------------------------------
This is not due to large block-size or 50K blocks exhaustion. Sound something different and may be from backend its failing for some other reason. Is there any sort of storage account Metrix pointing any specific error.
Is this the only error you are hitting or there is something else. Due to the long messages and all I might have overlooked some part so just wanted to understand the real issue you are facing here.
Any updates on this issue ?
Hello! I apologise for the long silence, I was on holiday. Unfortunately, we didn't have StorageAccount logging set up.
Today I created a Log Analytics workspace and ran backups for a new test in the configuration (block-size-mb: 128, mem-size-mb: 56320, prefetch: 12).
We see the following picture:
Please tell me what kind of logs you need for analysis.
As for your question about the types of errors in the logs.
Wed Jun 12 22:57:54 EEST 2024 : blobfuse2[1215] : [/opt/blobfuse/fuse_container] LOG_ERR [libfuse_handler.go (815)]: Libfuse::libfuse_flush : error flushing file /storage/backup/mssql-site2-diff.bak, handle: 2066 [PUT https://backupStorage.blob.core.windows.net/storage/backup/mssql-site2-diff.bak -------------------------------------------------------------------------------- RESPONSE 400: 400 One of the request inputs is not valid. ERROR CODE: InvalidInput --------------------------------------------------------------------------------
This is not due to large block-size or 50K blocks exhaustion. Sound something different and may be from backend its failing for some other reason. Is there any sort of storage account Metrix pointing any specific error.
Here are more examples of errors that we see in the logs:
Mon Jun 17 13:46:43 EEST 2024 : blobfuse2[5173] : [/opt/blobfuse/fuse_container] LOG_ERR [block_blob.go (1331)]: BlockBlob::StageAndCommit : Failed to stage to blob Client/BackupFolder/site5/site5-db-20240617-104510-809-full.bak with ID OTVhYTgxOGItZjAxMS00YjQ0LTcyYWYtMTcwYzBiYWM3MGE3 at block 654311424 [PUT https://mystorage.blob.core.windows.net/mystorage/Client/BackupFolder/site5/site5-db-20240617-104510-809-full.bak
--------------------------------------------------------------------------------
RESPONSE 400: 400 The specified blob or block content is invalid.
ERROR CODE: InvalidBlobOrBlock
--------------------------------------------------------------------------------
<?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidBlobOrBlock</Code><Message>The specified blob or block content is invalid.
RequestId:c4597294-801e-003c-43a3-c0f2da000000
Time:2024-06-17T10:46:43.8223452Z</Message></Error>
--------------------------------------------------------------------------------
]
Mon Jun 17 13:46:43 EEST 2024 : blobfuse2[5173] : [/opt/blobfuse/fuse_container] LOG_ERR [block_blob.go (1160)]: BlockBlob::TruncateFile : Failed to stage and commit file Client/BackupFolder/site5/site5-db-20240617-104510-809-full.bak%!(EXTRA string=PUT https://mystorage.blob.core.windows.net/mystorage/Client/BackupFolder/site5/site5-db-20240617-104510-809-full.bak
--------------------------------------------------------------------------------
RESPONSE 400: 400 The specified blob or block content is invalid.
ERROR CODE: InvalidBlobOrBlock
--------------------------------------------------------------------------------
Mon Jun 17 13:45:18 EEST 2024 : blobfuse2[5173] : [/opt/blobfuse/fuse_container] LOG_DEBUG [utils.go (151)]: SDK(ResponseError) : HEAD https://mystorage.blob.core.windows.net/mystorage/Client/BackupFolder/site7
--------------------------------------------------------------------------------
RESPONSE 404: 404 The specified blob does not exist.
ERROR CODE: BlobNotFound
--------------------------------------------------------------------------------
Mon Jun 17 16:35:39 EEST 2024 : blobfuse2[17024] : [/opt/blobfuse/fuse_container] LOG_DEBUG [utils.go (151)]: SDK(ResponseError) : GET https://mystorage.blob.core.windows.net/mystorage/Client/BackupFolder/site9/site9-db-20240617-104615-7185-v16-Dump.backup
--------------------------------------------------------------------------------
RESPONSE 416: 416 The range specified is invalid for the current size of the resource.
ERROR CODE: InvalidRange
--------------------------------------------------------------------------------
Hi! Finally, we got to the bottom of the cause of the "FlushFileBuffers" error: it is caused by MSSQL backup COMPRESSION. And as far as I understand, it is impossible to combine block_cache and backup compression. We will continue testing. Thank you!
Can you share more details on why backup compression and block-cache can not be combined? If the data is compressed and then stored as a file to storage using block-cache it shall work just fine. If the data is written to a file and then compression is applied on that data (which is now residing in cloud) then there is a huge cost of downloading the data agian and then compressing it and reuploading the compressed data. What is the exact use-case here?
I'm not ruling out that I could be wrong, but it looks like this...
https://learn.microsoft.com/en-us/sql/relational-databases/backup-restore/backup-compression-sql-server?view=sql-server-2016#Allocation
Truncation of a file to larger or smaller offset is something that we support provided that file was closed and uploaded previously. If application tries to truncate a file in transition then it might lead to a failure.
Unfortunately, tests strongly suggest that it does not work with MSSQL compression.
I turned off the compression and run tests, and again problems, with MSSQL so far ok, but not with Postgresql :(
pg_dump: error: could not open file "/mnt/DailyClientBackup/site1-20240627-132357/site1/site1-dump.backup": Input/output error
Thu Jun 27 16:25:57 EEST 2024 : blobfuse2[798] : [/opt/blobfuse/fuse_container] LOG_ERR [block_cache.go (357)]: BlockCache::OpenFile : Block size mismatch for Client/SQLDailyBackup/site1/site1-db-20240627-132357-813-v16-Dump.backup [block: JeHoTH2cS8dr294cbrYiIQ==, size: 24395776]
Thu Jun 27 16:25:57 EEST 2024 : blobfuse2[798] : [/opt/blobfuse/fuse_container] LOG_ERR [libfuse_handler.go (710)]: Libfuse::libfuse_open : Failed to open Client/SQLDailyBackup/site1/site1-db-20240627-132357-813-v16-Dump.backup [block size mismatch for Client/SQLDailyBackup/site1/site1-db-20240627-132357-813-v16-Dump.backup]
Debug logs:
blobfuse2log.txt
and now we have some logs from the Azure:
Update. Yesterday's issue was resolved after changing the ACL rules to a directory in the storage via Azure Storage Explorer. This is very strange since the rules were set correctly anyway, we will continue to monitor.
Another critical issue is the network share crash. This problem has been haunting us since the very beginning of testing, and so far, I have not been able to understand what is causing it.
We were running 6 backups in parallel. After about 5 minutes, the network share crashed.
These are the logs we see in the last seconds before the network share became unavailable.
Fri Jun 28 01:56:11 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_DEBUG [utils.go (151)]: SDK(Retry) : response 201
Fri Jun 28 01:56:11 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_DEBUG [utils.go (151)]: SDK(Retry) : exit due to non-retriable status code
Fri Jun 28 01:56:11 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [libfuse_handler.go (1142)]: Libfuse::libfuse_utimens : Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:11 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [attr_cache.go (467)]: AttrCache::GetAttr : Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:11 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_DEBUG [attr_cache.go (486)]: AttrCache::GetAttr : Client/BackupFolder/site1/site1-db-diff.bak served from cache
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [libfuse_handler.go (1142)]: Libfuse::libfuse_utimens : Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [attr_cache.go (467)]: AttrCache::GetAttr : Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_DEBUG [attr_cache.go (486)]: AttrCache::GetAttr : Client/BackupFolder/site1/site1-db-diff.bak served from cache
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [libfuse_handler.go (1142)]: Libfuse::libfuse_utimens : Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [attr_cache.go (467)]: AttrCache::GetAttr : Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_DEBUG [attr_cache.go (486)]: AttrCache::GetAttr : Client/BackupFolder/site1/site1-db-diff.bak served from cache
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [libfuse_handler.go (1142)]: Libfuse::libfuse_utimens : Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [attr_cache.go (467)]: AttrCache::GetAttr : Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_DEBUG [attr_cache.go (486)]: AttrCache::GetAttr : Client/BackupFolder/site1/site1-db-diff.bak served from cache
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [libfuse_handler.go (1142)]: Libfuse::libfuse_utimens : Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [attr_cache.go (467)]: AttrCache::GetAttr : Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_DEBUG [attr_cache.go (486)]: AttrCache::GetAttr : Client/BackupFolder/site1/site1-db-diff.bak served from cache
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_DEBUG [block_cache.go (1095)]: BlockCache::waitAndFreeUploadedBlocks : Block cleanup for block 731=>Client/BackupFolder/site1/site1-db-diff.bak (index 69, offset 9261023232)
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_DEBUG [block_cache.go (1049)]: BlockCache::lineupUpload : Upload block 731=>Client/BackupFolder/site1/site1-db-diff.bak (index 86, offset 11542724608, data 134217728)
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [block_blob.go (1422)]: BlockBlob::StageBlock : name Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_DEBUG [utils.go (151)]: SDK(Retry) : =====> Try=1
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_DEBUG [utils.go (151)]: SDK(Request) : ==> OUTGOING REQUEST (Try=1)
PUT https://mystore.blob.core.windows.net/mycontainer/Client%2FBackupFolder%2Fsite1%2Fsite1-db-diff.bak?blockid=iQxbQx0HQPtPoJxTDstH%2BQ%3D%3D&comp=block
Accept: application/xml
Authorization: REDACTED
Content-Length: 134217728
Content-Type: application/octet-stream
User-Agent: Azure-Storage-Fuse/2.3.0 (Debian GNU/Linux 11 (bullseye)) azsdk-go-azblob/v1.3.2 (go1.22.1; linux)
X-Ms-Date: Thu, 27 Jun 2024 22:56:12 GMT
x-ms-version: 2023-11-03
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [libfuse_handler.go (1142)]: Libfuse::libfuse_utimens : Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [attr_cache.go (467)]: AttrCache::GetAttr : Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_DEBUG [attr_cache.go (486)]: AttrCache::GetAttr : Client/BackupFolder/site1/site1-db-diff.bak served from cache
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [libfuse_handler.go (1142)]: Libfuse::libfuse_utimens : Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [attr_cache.go (467)]: AttrCache::GetAttr : Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_DEBUG [attr_cache.go (486)]: AttrCache::GetAttr : Client/BackupFolder/site1/site1-db-diff.bak served from cache
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [libfuse_handler.go (1142)]: Libfuse::libfuse_utimens : Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [attr_cache.go (467)]: AttrCache::GetAttr : Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_DEBUG [attr_cache.go (486)]: AttrCache::GetAttr : Client/BackupFolder/site1/site1-db-diff.bak served from cache
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [libfuse_handler.go (1142)]: Libfuse::libfuse_utimens : Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [attr_cache.go (467)]: AttrCache::GetAttr : Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_DEBUG [attr_cache.go (486)]: AttrCache::GetAttr : Client/BackupFolder/site1/site1-db-diff.bak served from cache
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [libfuse_handler.go (1142)]: Libfuse::libfuse_utimens : Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [attr_cache.go (467)]: AttrCache::GetAttr : Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_DEBUG [attr_cache.go (486)]: AttrCache::GetAttr : Client/BackupFolder/site1/site1-db-diff.bak served from cache
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [libfuse_handler.go (1142)]: Libfuse::libfuse_utimens : Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [attr_cache.go (467)]: AttrCache::GetAttr : Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_DEBUG [attr_cache.go (486)]: AttrCache::GetAttr : Client/BackupFolder/site1/site1-db-diff.bak served from cache
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [libfuse_handler.go (1142)]: Libfuse::libfuse_utimens : Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [attr_cache.go (467)]: AttrCache::GetAttr : Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_DEBUG [attr_cache.go (486)]: AttrCache::GetAttr : Client/BackupFolder/site1/site1-db-diff.bak served from cache
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_DEBUG [block_cache.go (1095)]: BlockCache::waitAndFreeUploadedBlocks : Block cleanup for block 731=>Client/BackupFolder/site1/site1-db-diff.bak (index 70, offset 9395240960)
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_DEBUG [block_cache.go (1049)]: BlockCache::lineupUpload : Upload block 731=>Client/BackupFolder/site1/site1-db-diff.bak (index 87, offset 11676942336, data 134217728)
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [block_blob.go (1422)]: BlockBlob::StageBlock : name Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_DEBUG [utils.go (151)]: SDK(Retry) : =====> Try=1
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_DEBUG [utils.go (151)]: SDK(Request) : ==> OUTGOING REQUEST (Try=1)
PUT https://mystore.blob.core.windows.net/mycontainer/Client%2FBackupFolder%2Fsite1%2Fsite1-db-diff.bak?blockid=T%2Ff10O7YSj1hwuzlWbIZFQ%3D%3D&comp=block
Accept: application/xml
Authorization: REDACTED
Content-Length: 134217728
Content-Type: application/octet-stream
User-Agent: Azure-Storage-Fuse/2.3.0 (Debian GNU/Linux 11 (bullseye)) azsdk-go-azblob/v1.3.2 (go1.22.1; linux)
X-Ms-Date: Thu, 27 Jun 2024 22:56:12 GMT
x-ms-version: 2023-11-03
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [libfuse_handler.go (1142)]: Libfuse::libfuse_utimens : Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_TRACE [attr_cache.go (467)]: AttrCache::GetAttr : Client/BackupFolder/site1/site1-db-diff.bak
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_DEBUG [attr_cache.go (486)]: AttrCache::GetAttr : Client/BackupFolder/site1/site1-db-diff.bak served from cache
Fri Jun 28 01:56:12 EEST 2024 : blobfuse2[792] : [/opt/blobfuse/fuse_container] LOG_DEBUG [utils.go (151)]: SDK(Response) : ==> REQUEST/RESPONSE (Try=1/1.989390259s, OpTime=1.989417219s) -- RESPONSE RECEIVED
PUT https://mystore.blob.core.windows.net/mycontainer/Client%2FBackupFolder%2Fsite1%2Fsite1-db-diff.bak?blockid=M4kx5pwnSG1%2BhhGDzHZ7JQ%3D%3D&comp=block
Accept: application/xml
Authorization: REDACTED
Content-Length: 134217728
Content-Type: application/octet-stream
User-Agent: Azure-Storage-Fuse/2.3.0 (Debian GNU/Linux 11 (bullseye)) azsdk-go-azblob/v1.3.2 (go1.22.1; linux)
X-Ms-Date: Thu, 27 Jun 2024 22:56:10 GMT
x-ms-version: 2023-11-03
--------------------------------------------------------------------------------
RESPONSE Status: 201 Created
Content-Length: 0
Date: Thu, 27 Jun 2024 22:56:11 GMT
Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
X-Ms-Content-Crc64: iS+krsn+pYw=
X-Ms-Request-Id: 13793667-001e-0040-24e5-c8dc25000000
X-Ms-Request-Server-Encrypted: REDACTED
X-Ms-Version: 2023-11-03
By network share crash you mean blobfuse mount path becomes unstable after some time? When you say 5 backups running in parallel are they running for the same path or you have 5 different mounts? Also, if crash is consistent you can use "--foreground" cli option and when it crashes it will dump some logs on console, kindly share those. In foreground mode mount will block the console so you have to run backup or other things from a different console.
Yes, I mean that for some reason the mounting point disappears, while on the graph we see the RAM being cleared.
Run in parallel - I mean that all backups are made to a single mount point.
I added the option --foreground and this is what we have after the crash:
Hi, just wondering if these logs helped you in any way? I'm worried if this can be fixed quickly at all. If there's anything else, I'm ready for shamanic dances and prayers :) Thank you!
Hi @olegshv. Can you also please share the debug logs for the above crash? We are trying to replicate this error. Is this error happening frequently for your use case?
@souravgupta-msft hi! Yes, of course. Here are the logs: blobfuse2_updated.log
I would say that in this configuration, every launch will crash. Depending on the configuration parameters, the crash can occur at different times. I'll record some more logs for you.
upd
It fell again, with the following configurations:
Which version of blobfuse was used?
blobfuse2 version 2.3.0
Which OS distribution and version are you using?
Debian GNU/Linux 11 (bullseye)
If relevant, please share your mount command.
blobfuse2 mount /opt/blobfuse/fuse_container/ --config-file=/opt/blobfuse/fuse_connection.yaml
My fuse_connection.yaml file
What was the issue encountered?
A brief description of what we do with blogfusion2:
We are currently testing the transition on stream or block_cache. After a few days of testing, the stream was abandoned altogether, as we never managed to successfully back up. block_cache gives us hope in our hearts :)
For example, here is a 3.6TB database backup running in 5 threads in parallel. It's ok.
![image](https://github.com/Azure/azure-storage-fuse/assets/9905160/cf9808da-2378-4320-9329-186744657ddf)
But here's an example of 15-thread jobs, where there are small databases and one large one, the load is not high, and it failed.
There is no load, but we get an error.
Unfortunately, after reading a lot of information and documentation, I did not find an answer on scaling and configurations for large loads. We ran tests even with large VMs (32cpu, 256ram), but it didn't work. I would like to get some recommendations on how to solve this case in the most reliable way. Which settings are better to use in block_cache: for such tasks?
Have you found a mitigation/solution?
No
Please share logs if available.
In log_debug:
RESPONSE 404: 404 The specified blob does not exist. ERROR CODE: BlobNotFound
Response contained no body
<?xml version="1.0" encoding="utf-8"?>The block list may not contain more than 50,000 blocks.
RequestId:3a61884d-201e-0005-6461-b863c1000000
Time:2024-06-06T22:31:44.8071145Z
BlockListTooLong