upload_block_blob_from_buffer throw "Resource temporarily unavailable" exception after ~32000 blob uploads

fqun2014 commented 2 years ago

I debugged it a little bit, the exception was thrown at: context->task_futures.emplace_back(std::async(std::launch::async, thread_upload_func));

Once this condition happened, I could not recover from it. I created a new blob_client and tried to use it, but I still get the exception.

This is running on Ubuntu: LSB Version: core-9.20160110ubuntu0.2-amd64:core-9.20160110ubuntu0.2-noarch:security-9.20160110ubuntu0.2-amd64:security-9.20160110ubuntu0.2-noarch Distributor ID: Ubuntu Description: Ubuntu 16.04.2 LTS Release: 16.04 Codename: xenial

Jinming-Hu commented 2 years ago

It's hard to say what the root cause is due to limited information

But my guess is that you created too many threads. Please try to limit the concurrent upload tasks number.

fqun2014 commented 2 years ago

Actually the 'maxConcurrency' was set to 1 when creating the blob_client. The blob object size is ~1MB, so I didn't use many threads. The object is transfered one at a time. What information do you suggest I can collect that may help debugging this? Thanks.

Jinming-Hu commented 2 years ago

I suspect there's some kind of leak, maybe you can collect the memory usage, thread number and fd number of your application over time.

fqun2014 commented 2 years ago

I know the test only uses a few threads. Maybe around 8 threads total. I see one of them kept changing task ID. That probably isthe one created by the std::async, get created/deleted for each blob transfer. I will collect the memory usage and the fd.

I only use one API: AZURE_STORAGE_API std::future<storage_outcome> upload_block_from_buffer(const std::string &container, const std::string &blob, const std::string &blockid, const char* buffer, uint64_t bufferlen);

I don't believe the application opens any file itself. I'm not sure if the API open/close files, but the data source is in memory.

fqun2014 commented 2 years ago

Hi, I did a strace on the test that uploading blobs to azure. It looks like when the test crushed, it could not allocate memory. It seems the memory it tried to allocate is for creating a thread. I believe it is during call to std::async. When I monitor /proc/id/maps, I see memory region kept getting added to the process. but never get released.

Here is the output segment of strace before the program crushed:

mmap(NULL, 8392704, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f1d338cf000
mprotect(0x7f1d338cf000, 4096, PROT_NONE) = 0
clone(child_stack=0x7f1d340ce4f0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f1d340cf9d0, tls=0x7f1d340cf700, child_tidptr=0x7f1d340cf9d0) = 9747
futex(0x44b0d30, FUTEX_WAIT, 2147483648, NULL) = 0
stat("/dev/random", {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 8), ...}) = 0
open("/dev/urandom", O_RDONLY|O_CLOEXEC) = 5
fcntl(5, F_GETFD)                       = 0x1 (flags FD_CLOEXEC)
fcntl(5, F_SETFD, FD_CLOEXEC)           = 0
getuid()                                = 1050003721
getppid()                               = 29649
read(5, "\273\300\360\223\355\f\3763\316\330\354\307\334\225C\243", 16) = 16
close(5)                                = 0
gettid()                                = 7926
mmap(NULL, 8392704, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f1d330ce000
mprotect(0x7f1d330ce000, 4096, PROT_NONE) = 0
clone(child_stack=0x7f1d338cd4f0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f1d338ce9d0, tls=0x7f1d338ce700, child_tidptr=0x7f1d338ce9d0) = 9748
futex(0x44b1310, FUTEX_WAIT, 2147483648, NULL) = 0
stat("/dev/random", {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 8), ...}) = 0
open("/dev/urandom", O_RDONLY|O_CLOEXEC) = 5
fcntl(5, F_GETFD)                       = 0x1 (flags FD_CLOEXEC)
fcntl(5, F_SETFD, FD_CLOEXEC)           = 0
getuid()                                = 1050003721
getppid()                               = 29649
read(5, "Z\377Xh e\366\324\341\337\245\274\372z*\322", 16) = 16
close(5)                                = 0
gettid()                                = 7926
mmap(NULL, 8392704, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f1d328cd000
mprotect(0x7f1d328cd000, 4096, PROT_NONE) = -1 ENOMEM (Cannot allocate memory)
munmap(0x7f1d328cd000, 8392704)         = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5cf1bd8000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5cf1bd7000
mincore(0x7ffff7523d17, 1, 0x7ffff7523d17) = -1 EINVAL (Invalid argument)
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0

Here is the code that using the API:

void StorageClient::uploadObjectToCloud(const std::string &objectName, const char* buffer, uint64_t bufferLen) {
    std::vector<std::pair<std::string, std::string>> metadata;
    SLOG(INFO) << " container name: " << container->getName()
               << " blob name:  " << objectName
               << " buffer: " << (uint64_t)buffer
               << " bufferLen: " << bufferLen;
    auto ret = client->upload_block_blob_from_buffer(container->getName(), objectName, buffer, metadata, bufferLen, 1).get();
    if (!ret.success()) {
        SLOG(ERROR) << "Failed to upload blob, Error: " << ret.error().code_name << ", " << ret.error().code;
    } else {
      SLOG(INFO) << "blob: upload " << objectName << " to cloud";  
    }
}

Shouldn't the thread created by std::async get destroyed when this function returns? I would think I will see some unmap in strace.

fqun2014 commented 2 years ago

Hi, Just for an experiment, I removed the internally concurrent tasks mechanism from blob_client::upload_block_blob_from_buffer(), so it doesn't have to create the threads internally. The problem goes away.

std::future<storage_outcome> blob_client::upload_block_blob_from_buffer_new(const std::string &container, const std::string &blob, const char buffer, const std::vector<std::pair<std::string, std::string>> &metadata, uint64_t bufferlen) { if (bufferlen > constants::max_num_blocks constants::max_block_size) { storage_error error; error.code = std::to_string(blob_too_big); std::promise<storage_outcome> task_promise; task_promise.set_value(storage_outcome(error)); return task_promise.get_future(); }

std::string uuid = get_uuid();
std::string block_id = std::to_string(0);
block_id = uuid + std::string(48 - uuid.length() - block_id.length(), '-') + block_id;
block_id = to_base64(reinterpret_cast<const unsigned char*>(block_id.data()), block_id.length());
storage_error failed_reason;
std::atomic<bool> failed{ false };
std::promise<storage_outcome<void>> task_promise;

auto result = upload_block_from_buffer(container, blob, block_id, buffer, bufferlen).get();
if (!result.success())
{
    failed.exchange(true);
    failed_reason = result.error();
}

if (!failed) {
    std::vector<put_block_list_request_base::block_item> block_list;
    block_list.emplace_back(put_block_list_request_base::block_item{ std::move(block_id), put_block_list_request_base::block_type::uncommitted });
    auto result = put_block_list(container, blob, block_list, metadata).get();
    if (!result.success())
    {
        failed.store(true);
        failed_reason = result.error();
    }
}
task_promise.set_value(failed ? storage_outcome<void>(failed_reason) : storage_outcome<void>());
return task_promise.get_future();

}

Jinming-Hu commented 2 years ago

Shouldn't the thread created by std::async get destroyed when this function returns? I would think I will see some unmap in strace.

unmap not being called doesn't necessarily mean the memory is not deallocated.

mmap and munmap are functions user-space program uses to allocate and deallocate memory from kernel. But user-space program itself also does some memory management.

I suggest you use tools like valgrind to monitor your application and see where the memory leak happens.

fqun2014 commented 2 years ago

If you see my last comment, I didn’t change anything outside, kept the function signature the same. Change the function internally(stop using std::async, do the data transfer in one shot), the memory leak stops.

On Jul 12, 2021, at 22:47, JinmingHu @.***> wrote:

Shouldn't the thread created by std::async get destroyed when this function returns? I would think I will see some unmap in strace.

unmap not being called doesn't necessarily mean the memory is not deallocated.

mmap and munmap are functions user-space program uses to allocate and deallocate memory from kernel. But user-space program itself also does some memory management.

I suggest you use tools like valgrind to monitor your application and see where the memory leak happens.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

fqun2014 commented 2 years ago

Also the problem will go away with the following change: for (int i = 0; i < parallelism; ++i) {

context->task_futures.emplace_back(std::async(std::launch::async, thread_upload_func));
context->task_futures.emplace_back(std::async(std::launch::deferred, thread_upload_func));
context->task_futures[i].wait(); } Somehow running the function in a separate thread allocates memory but not releasing it(at least on my platform). If running for a long time, it may lead to out of memory exception.

Jinming-Hu commented 2 years ago

Hi @fqun2014 , I can confirm there's a leak issue in this function.

However, at the moment, I don't even know how to fix this leak issue and keep the concurrent functionality working.

Anyway, this cpplite sdk has been deprecated in favor of our Track2 SDK, which is GA already. Is migrating to Track2 SDK an option to you?

fqun2014 commented 2 years ago

Hi @Jinming-Hu, thanks a lot for confirming the issue. I will looking into Track2 SDK. Would you consider converting the function(also the download function) to non-concurrent version so people can still use the library? Maybe it is fast enough for some users.

Jinming-Hu commented 2 years ago

Would you consider converting the function(also the download function) to non-concurrent version so people can still use the library? Maybe it is fast enough for some users.

Yeah, maybe. But other customers may still want concurrent transfer to exploit the bandwidth capacity.

fqun2014 commented 2 years ago

Hi @Jinming-Hu, I'm evaluating Track2 SDK. From I can tell, the ConcurrentTransfer code in Track2 SDK is very similar to azure-storage-cpplite version. Won't the same problem show up there as well?

Jinming-Hu commented 2 years ago

Hi @Jinming-Hu, I'm evaluating Track2 SDK. From I can tell, the ConcurrentTransfer code in Track2 SDK is very similar to azure-storage-cpplite version. Won't the same problem show up there as well?

I wrote concurrent transfer code in both Track1 cpplite and Track2 SDK. I don't think Track2 SDK has the same problem. If you observe memory leak or any other kind of resource leak, please open an issue in Track2 SDK repo.

fqun2014 commented 2 years ago

Hi @Jinming-Hu, I tried the same test on Track2 SDK. Could not recreate the problem.

Azure / azure-storage-cpplite

upload_block_blob_from_buffer throw "Resource temporarily unavailable" exception after ~32000 blob uploads #120