Open malloryfreeberg opened 5 years ago
I'm guessing this is because the CLI is now doing client-side checksumming? @maniarathi do you stream the file while checksumming or attempt to read it into memory? Hmm looks like it is streamed in chunks get_s3_multipart_chunk_size
big. I wonder how much the memory balloons. Someone should attempt to reproduce this. Unfortunately it can't be me as I am on a low-bandwidth link.
So I did actually test the memory footprint of this a while back and the memory was 64MB which is what is expected given that it streams it in that sized chunks.
@malloryfreeberg how much memory was consumed? Alas your Activity monitor screenshots don't show that.
As for CPU, I expect that simultaneous checksumming of several files will be quite CPU intensive. Does it limit parallelization? It looks like it does, based on the number of cores you have DEFAULT_THREAD_COUNT = multiprocessing.cpu_count() * 2
. On my machine cpu_count()
returns 8, so that means it is trying to checksum all 16 files simultaneously. That's a bad thing.
There are several ways to fix this: 1) parallelize less aggressively - reduce thread count 2) provide a command line option to limit parallelism further 3) calculate checksums in-line while uploading, which will limit the parallelism based on your available bandwidth.
I realize # 3 doesn't work well with the current architecture, as client-side and server-side checksums are compared before upload starts. I wish there was a more efficient way to decide whether to upload or not. We should probably do # 1 and # 2.
@sampierson @maniarathi I unfortunately did not grab memory usage during this time. I can reproduce, but I'll have to download the files to my local machine again :( Stay tuned!
@malloryfreeberg Don't bother. I think we know what the culprit is. I think the problem is CPU not memory.
I was using
hca upload files *
to upload about 80GB of fastq files (16 files) from my local machine to an upload area. During the transfer, I experienced significant slowdown of everything else running on my machine. I don't remember experiencing this slowdown before, although I haven't had to transfer files from a local source in a while. It looks like my machine was maxed out on CPU usage (screenshots below). Wondering if this is normal or expected behavior? It doesn't seem ideal...During transfer:
After transfer: