Open fqun2014 opened 2 years ago
It's hard to say what the root cause is due to limited information
But my guess is that you created too many threads. Please try to limit the concurrent upload tasks number.
Actually the 'maxConcurrency' was set to 1 when creating the blob_client. The blob object size is ~1MB, so I didn't use many threads. The object is transfered one at a time. What information do you suggest I can collect that may help debugging this? Thanks.
I suspect there's some kind of leak, maybe you can collect the memory usage, thread number and fd number of your application over time.
I know the test only uses a few threads. Maybe around 8 threads total. I see one of them kept changing task ID. That probably isthe one created by the std::async, get created/deleted for each blob transfer. I will collect the memory usage and the fd.
I only use one API:
AZURE_STORAGE_API std::future<storage_outcome
I don't believe the application opens any file itself. I'm not sure if the API open/close files, but the data source is in memory.
Hi, I did a strace on the test that uploading blobs to azure. It looks like when the test crushed, it could not allocate memory. It seems the memory it tried to allocate is for creating a thread. I believe it is during call to std::async. When I monitor /proc/id/maps, I see memory region kept getting added to the process. but never get released.
Here is the output segment of strace before the program crushed:
mmap(NULL, 8392704, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f1d338cf000
mprotect(0x7f1d338cf000, 4096, PROT_NONE) = 0
clone(child_stack=0x7f1d340ce4f0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f1d340cf9d0, tls=0x7f1d340cf700, child_tidptr=0x7f1d340cf9d0) = 9747
futex(0x44b0d30, FUTEX_WAIT, 2147483648, NULL) = 0
stat("/dev/random", {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 8), ...}) = 0
open("/dev/urandom", O_RDONLY|O_CLOEXEC) = 5
fcntl(5, F_GETFD) = 0x1 (flags FD_CLOEXEC)
fcntl(5, F_SETFD, FD_CLOEXEC) = 0
getuid() = 1050003721
getppid() = 29649
read(5, "\273\300\360\223\355\f\3763\316\330\354\307\334\225C\243", 16) = 16
close(5) = 0
gettid() = 7926
mmap(NULL, 8392704, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f1d330ce000
mprotect(0x7f1d330ce000, 4096, PROT_NONE) = 0
clone(child_stack=0x7f1d338cd4f0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f1d338ce9d0, tls=0x7f1d338ce700, child_tidptr=0x7f1d338ce9d0) = 9748
futex(0x44b1310, FUTEX_WAIT, 2147483648, NULL) = 0
stat("/dev/random", {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 8), ...}) = 0
open("/dev/urandom", O_RDONLY|O_CLOEXEC) = 5
fcntl(5, F_GETFD) = 0x1 (flags FD_CLOEXEC)
fcntl(5, F_SETFD, FD_CLOEXEC) = 0
getuid() = 1050003721
getppid() = 29649
read(5, "Z\377Xh e\366\324\341\337\245\274\372z*\322", 16) = 16
close(5) = 0
gettid() = 7926
mmap(NULL, 8392704, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f1d328cd000
mprotect(0x7f1d328cd000, 4096, PROT_NONE) = -1 ENOMEM (Cannot allocate memory)
munmap(0x7f1d328cd000, 8392704) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5cf1bd8000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5cf1bd7000
mincore(0x7ffff7523d17, 1, 0x7ffff7523d17) = -1 EINVAL (Invalid argument)
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
Here is the code that using the API:
void StorageClient::uploadObjectToCloud(const std::string &objectName, const char* buffer, uint64_t bufferLen) {
std::vector<std::pair<std::string, std::string>> metadata;
SLOG(INFO) << " container name: " << container->getName()
<< " blob name: " << objectName
<< " buffer: " << (uint64_t)buffer
<< " bufferLen: " << bufferLen;
auto ret = client->upload_block_blob_from_buffer(container->getName(), objectName, buffer, metadata, bufferLen, 1).get();
if (!ret.success()) {
SLOG(ERROR) << "Failed to upload blob, Error: " << ret.error().code_name << ", " << ret.error().code;
} else {
SLOG(INFO) << "blob: upload " << objectName << " to cloud";
}
}
Shouldn't the thread created by std::async get destroyed when this function returns? I would think I will see some unmap in strace.
Hi, Just for an experiment, I removed the internally concurrent tasks mechanism from blob_client::upload_block_blob_from_buffer(), so it doesn't have to create the threads internally. The problem goes away.
std::future<storage_outcome
std::string uuid = get_uuid();
std::string block_id = std::to_string(0);
block_id = uuid + std::string(48 - uuid.length() - block_id.length(), '-') + block_id;
block_id = to_base64(reinterpret_cast<const unsigned char*>(block_id.data()), block_id.length());
storage_error failed_reason;
std::atomic<bool> failed{ false };
std::promise<storage_outcome<void>> task_promise;
auto result = upload_block_from_buffer(container, blob, block_id, buffer, bufferlen).get();
if (!result.success())
{
failed.exchange(true);
failed_reason = result.error();
}
if (!failed) {
std::vector<put_block_list_request_base::block_item> block_list;
block_list.emplace_back(put_block_list_request_base::block_item{ std::move(block_id), put_block_list_request_base::block_type::uncommitted });
auto result = put_block_list(container, blob, block_list, metadata).get();
if (!result.success())
{
failed.store(true);
failed_reason = result.error();
}
}
task_promise.set_value(failed ? storage_outcome<void>(failed_reason) : storage_outcome<void>());
return task_promise.get_future();
}
Shouldn't the thread created by std::async get destroyed when this function returns? I would think I will see some unmap in strace.
unmap
not being called doesn't necessarily mean the memory is not deallocated.
mmap
and munmap
are functions user-space program uses to allocate and deallocate memory from kernel. But user-space program itself also does some memory management.
I suggest you use tools like valgrind to monitor your application and see where the memory leak happens.
If you see my last comment, I didn’t change anything outside, kept the function signature the same. Change the function internally(stop using std::async, do the data transfer in one shot), the memory leak stops.
On Jul 12, 2021, at 22:47, JinmingHu @.***> wrote:
Shouldn't the thread created by std::async get destroyed when this function returns? I would think I will see some unmap in strace.
unmap not being called doesn't necessarily mean the memory is not deallocated.
mmap and munmap are functions user-space program uses to allocate and deallocate memory from kernel. But user-space program itself also does some memory management.
I suggest you use tools like valgrind to monitor your application and see where the memory leak happens.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Also the problem will go away with the following change: for (int i = 0; i < parallelism; ++i) {
Hi @fqun2014 , I can confirm there's a leak issue in this function.
However, at the moment, I don't even know how to fix this leak issue and keep the concurrent functionality working.
Anyway, this cpplite sdk has been deprecated in favor of our Track2 SDK, which is GA already. Is migrating to Track2 SDK an option to you?
Hi @Jinming-Hu, thanks a lot for confirming the issue. I will looking into Track2 SDK. Would you consider converting the function(also the download function) to non-concurrent version so people can still use the library? Maybe it is fast enough for some users.
Would you consider converting the function(also the download function) to non-concurrent version so people can still use the library? Maybe it is fast enough for some users.
Yeah, maybe. But other customers may still want concurrent transfer to exploit the bandwidth capacity.
Hi @Jinming-Hu, I'm evaluating Track2 SDK. From I can tell, the ConcurrentTransfer code in Track2 SDK is very similar to azure-storage-cpplite version. Won't the same problem show up there as well?
Hi @Jinming-Hu, I'm evaluating Track2 SDK. From I can tell, the ConcurrentTransfer code in Track2 SDK is very similar to azure-storage-cpplite version. Won't the same problem show up there as well?
I wrote concurrent transfer code in both Track1 cpplite and Track2 SDK. I don't think Track2 SDK has the same problem. If you observe memory leak or any other kind of resource leak, please open an issue in Track2 SDK repo.
Hi @Jinming-Hu, I tried the same test on Track2 SDK. Could not recreate the problem.
I debugged it a little bit, the exception was thrown at: context->task_futures.emplace_back(std::async(std::launch::async, thread_upload_func));
Once this condition happened, I could not recover from it. I created a new blob_client and tried to use it, but I still get the exception.
This is running on Ubuntu: LSB Version: core-9.20160110ubuntu0.2-amd64:core-9.20160110ubuntu0.2-noarch:security-9.20160110ubuntu0.2-amd64:security-9.20160110ubuntu0.2-noarch Distributor ID: Ubuntu Description: Ubuntu 16.04.2 LTS Release: 16.04 Codename: xenial