commercialhaskell / stackage-infrastructure

Deployment configuration for Stackage and tools. Central place for Stackage admins.
MIT License
0 stars 0 forks source link

uploading haddocks now takes several hours #4

Open juhp opened 3 months ago

juhp commented 3 months ago

With the switch from S3 to CF storage for stackage haddock documentation, uploads with the awscli to the CloudFlare via their S3 compatibility is really taking too long.

Nightly currently takes about 3 hours and LTS around 4 hours. In the past with S3 it just took minutes to do those uploads. The long delay basically prevents curators continuing to do work while the upload is happening.

juhp commented 3 months ago

LTS-22.22:

Tue May 14 05:38:42 PM UTC 2024                                                                                                     
:
Uploading docs to S3                                                                                                                
Creating hoogle/orig.tar                                                                                                            
Shelling out to AWS CLI to upload docs                                                                                              
Uploading snapshot definition to Github 
:
real    248m5.723s
user    0m0.781s
sys     0m0.218s
Tue May 14 09:46:48 PM UTC 2024
juhp commented 3 months ago

Nightly:

Wed May 15 01:18:19 AM UTC 2024
:
real    188m39.820s                                                                                                                 user    0m0.680s                                                                                                                    
sys     0m0.158s                                                                                                                    
Wed May 15 04:26:59 AM UTC 2024  
chreekat commented 3 months ago

Thanks for opening a ticket! From earlier discussion had been under the impression that this wasn't a blocker. I'll ping cloudflare to see if they've made any progress. It should not be happening.

chreekat commented 3 months ago

Btw what's the actual evidence that the docs upload is the slow part of the process? I was unable to reproduce slow uploads, myself.

(That's not to say I don't believe it's slow. I'm just not sure how to reproduce it.)

chreekat commented 3 months ago

It looks like the actual command run by curator upload-docs is aws s3 cp --only-show-errors --recursive --acl public-read --cache-control maxage=31536000.

I didn't try --recursive -- maybe that has an effect.

alaendle commented 3 months ago

EDIT: Wasn't fast enough to see the end. And log files already have been overwritten. But that probably doesn't change the basic observation. Upload seems to be limited by CF performance. I guess @chreekat is right - awscli seems to serialize file operations and doesn't use parallel requests (with this none-streaming approach even the normal latency would limit throughput)? I will try to save the log during the next lts.

Currently nightly is uploading while creating the debug-logs using the new curator version - but as it seems CF R2 is slow - upload speed around 1.3 MiB/s (and we upload several GB's). Will update this comment once the upload finishes...

chreekat commented 3 months ago

Interesting. I wonder if the problem is a fixed cost per file. I tried uploading one big file and it was fast. But curator is uploading a large number of small files. If there's a significant startup cost for each file, that could explain the issue.

mihaimaruseac commented 3 months ago

That makes sense. From previous interactions with GCS, I observed a similar behavior. It's faster to archive all files into a single large upload than uploading individually.

alaendle commented 3 months ago

Last log entry from Nightly-Upload - uploaded 9.8 GB by 1.0 MiB/s => ~ 2,75 h; one way to maybe improve upload time would be parallelization (e.g. there are tools that support parallel uploads - e.g. https://rclone.org/ and others). Maybe something you want to try @chreekat? 😉

juhp commented 3 months ago

(Presumably a single file upload is not an option, right?)

alaendle commented 3 months ago

My guess was wrong - also awscli uses multiple threads (https://docs.aws.amazon.com/cli/latest/topic/s3-config.html#max-concurrent-requests). So maybe we could tweak the setting to use e.g. 50 parallel connections?

chreekat commented 3 months ago

@alaendle I just heard from CF and they have essentially recommended the same action. They recommended rclone as a better alternative since apparently it "gives more control over increasing concurrency/number of transfers".

chreekat commented 3 months ago

I do wonder if there was some global aws config in the old system that implicitly increased the concurrency? I have a hard time believing that a sequential upload of all the files would be fast in any circumstance. Maybe I missed it when I was implementing the handover.

chreekat commented 2 months ago

Sorry this issue has dragged on. I have less time now since my HF contract has been reduced to 20% time. But it's still on my list...

juhp commented 2 months ago

I am adding rclone to the nightly image now - though I dunno how to use it