SwiftPackageIndex / SwiftPackageIndex-Server

The Swift Package Index is the place to find Swift packages!
https://swiftpackageindex.com
Apache License 2.0
559 stars 47 forks source link

Offload DocC uploads to upload service #2181

Closed finestructure closed 1 year ago

finestructure commented 1 year ago

We've seen via #2179 that uploading large doc sets is problematic. The 1GB/80k files doc set takes 6mins to transfer via scp between MacStadium machines.

Even if there was a network issue on the day, we'll be hard pressed to upload doc sets of that size within the 10min timeout we impose on builds (and that includes the build and doc generation).

We should probably consider sending the zipped doc set to an intermediary which then deals with the upload asynchronously and with a more generous timeout and additional retries than we afford the current upload.

Not sure what the file size limits are for lambda but I could see that being a decent solution: send a zip to a lambda and let it deal with the S3 population.

finestructure commented 1 year ago

Looking through AWS Lambda examples with Swift I'm not particularly excited by the prospect... that's a lot of legwork and would probably be particularly awkward to get going from an M1 Mac.

Maybe https://swiftpackageindex.com/swift-cloud/Compute is the better option?

finestructure commented 1 year ago

I had an app running in under 10 mins in Swift Cloud:

CleanShot 2022-12-10 at 11 12 58@2x

The problem is that it's in beta, so it's unclear what pricing and availability will be like.

What we could do though is only send certain problematic doc sets to a service powered by this. Right now we're not succeeding to upload them so worst case we're no worse off than before.

We can choose doc sets that exceed 500MB, zip them up and send them there and try our luck. Until we encountered #2179 we didn't even attempt to upload doc sets >500MB so either way this is an improvement.

finestructure commented 1 year ago

Ok, so this isn't going to be straightforward on Swift Cloud either, because SotoS3FileTransfer (or rather the dependencies it pulls in) doesn't compile with SwiftWasm:

12:17:02.578 | [build] swift build: /build_6debxa7ol4n2/.build/checkouts/async-http-client/Sources/CAsyncHTTPClient/CAsyncHTTPClient.c:35:38: warning: implicit declaration of function 'strptime_l' is invalid in C99 [-Wimplicit-function-declaration]     const char * firstNonProcessed = strptime_l(string, format, result, (locale_t)locale);                                      ^ /build_6debxa7ol4n2/.build/checkouts/async-http-client/Sources/CAsyncHTTPClient/CAsyncHTTPClient.c:35:18: warning: incompatible integer to pointer conversion initializing 'const char *' with an expression of type 'int' [-Wint-conversion]     const char * firstNonProcessed = strptime_l(string, format, result, (locale_t)locale);                  ^                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2 warnings generated.
-- | --
12:17:02.835 | [build] swift build: [1/461] Compiling _NIODataStructures Heap.swift /build_6debxa7ol4n2/.build/checkouts/swift-nio/Sources/_NIODataStructures/Heap.swift:199:28: error: cannot find 'log2' in scope                 return Int(log2(Double(index + 1)))                            ^~~~ [1/461] Compiling _AtomicsShims.c

Rolling our own S3 syncing isn't a great prospect either. I think in that case I'd rather deal with a more complicated deployment mechanism into AWS Lambda.

daveverwer commented 1 year ago

I'm still in London so can't look in any detail, but I met Andrew yesterday and he gave a great talk on exactly this subject. He's in our discord, too, so we have a friendly face to ask some of these questions of.

finestructure commented 1 year ago

I've also tested syncing via s3sync, which performs about as well or even slightly better than scp:

Fresh sync:

mb5:.docs $ time ./s3sync \
        --fs-disable-xattr \
        --s3-retry 5 \
        --s3-retry-sleep 5 \
        --tr us-east-2 \
        --tk "$AWS_KEY" \
        --ts "$AWS_SECRET" \
        fs:///Users/builder/Downloads/swift-syntax-checkout/.docs/apple \
        s3://spi-docs-test/

INFO[0000] Starting sync
INFO[0342] Pipeline step finished                        ErrorObj=0 InputObj=0 InputObjSpeed=0 OutputObj=81472 OutputObjSpeed=237.86278897337468 stepName=ListSource stepNum=0
INFO[0342] Pipeline step finished                        ErrorObj=0 InputObj=81472 InputObjSpeed=237.86278897337468 OutputObj=81472 OutputObjSpeed=237.86278897337468 stepName=LoadObjData stepNum=1
INFO[0342] Pipeline step finished                        ErrorObj=0 InputObj=81472 InputObjSpeed=237.86278897337468 OutputObj=81472 OutputObjSpeed=237.86278897337468 stepName=UploadObj stepNum=2
INFO[0342] Pipeline step finished                        ErrorObj=0 InputObj=81472 InputObjSpeed=237.86278897337468 OutputObj=0 OutputObjSpeed=0 stepName=Terminator stepNum=3
INFO[0342] Duration: 5m42.517058708s                     durationSec=342.517056625
INFO[0342] Sync Done                                     status=0
./s3sync --fs-disable-xattr --s3-retry 5 --s3-retry-sleep 5 --tr us-east-2     56.24s user 55.50s system 32% cpu 5:42.56 total

Re-sync:

mb5:.docs $ time ./s3sync \
        --fs-disable-xattr \
        --s3-retry 5 \
        --s3-retry-sleep 5 \
        --tr us-east-2 \
        --tk "$AWS_KEY" \
        --ts "$AWS_SECRET" \
        fs:///Users/builder/Downloads/swift-syntax-checkout/.docs/apple \
        s3://spi-docs-test/

INFO[0000] Starting sync
INFO[0345] Pipeline step finished                        ErrorObj=0 InputObj=0 InputObjSpeed=0 OutputObj=81472 OutputObjSpeed=235.5826086413408 stepName=ListSource stepNum=0
INFO[0345] Pipeline step finished                        ErrorObj=0 InputObj=81472 InputObjSpeed=235.5826086413408 OutputObj=81472 OutputObjSpeed=235.5826086413408 stepName=LoadObjData stepNum=1
INFO[0345] Pipeline step finished                        ErrorObj=0 InputObj=81472 InputObjSpeed=235.5826086413408 OutputObj=81472 OutputObjSpeed=235.5826086413408 stepName=UploadObj stepNum=2
INFO[0345] Pipeline step finished                        ErrorObj=0 InputObj=81472 InputObjSpeed=235.5826086413408 OutputObj=0 OutputObjSpeed=0 stepName=Terminator stepNum=3
INFO[0345] Duration: 5m45.832251791s                     durationSec=345.832249833
INFO[0345] Sync Done                                     status=0
./s3sync --fs-disable-xattr --s3-retry 5 --s3-retry-sleep 5 --tr us-east-2     55.48s user 51.38s system 30% cpu 5:45.87 total

The retry settings are critical, as this tool also triggers S3 errors, which are actually 500s. I even saw 500s when deleting the folder in S3 itself, which is very odd but at least indicates that it's not machine/setup specific.

It's notable that the sync speed is exactly the same whether the content is in the bucket already or not. Since the files are so small that makes sense: the time taken depends on the number of requests, i.e. the number of files, not their size. A check is as expensive as an upload.

finestructure commented 1 year ago

We could perhaps achieve similar sync times with our current setup if we didn't do exponential back-off and allowed for retries per file. Right now we retry the whole sync on error, that's probably what's making our sync so slow.

finestructure commented 1 year ago

Just spotted another package (via 404s in our logs) that's failing doc upload and therefore not showing a documentation link:

CleanShot 2022-12-19 at 11 00 29@2x

I've manually triggered a rebuild but I suspect this might be a rather frequent occurrence that we don't have good visibility into.

This doc upload service is probably important to fix not just for large doc set uploads (this package has 162MB of docs and is failing).

daveverwer commented 1 year ago

It might even be worth deleting builds for the ~200 packages with documentation when we have a fix in place.

finestructure commented 1 year ago

Doc Uploader

Link to implementation notes: https://noteplan.co/n/21830411-C2AB-4EFF-932C-1A04DBBCB16C

finestructure commented 1 year ago

Quick update. The lambda and builder changes are live (but inactive).

However, testing revealed performance problems even when copying from the Lambda to S3. With a bit of effort it can be done in 14m30s, which is just under the 15m hard Lambda runtime budget. That's too close to ship it like that.

Also, we'd be paying 15m of compute per version when all we're doing is copying (which takes ~13m30s), at a rate of $15 per mille.

We know that the doc set can be copied to S3 in under 6mins, so before shipping this I'm looking into ways to make this faster.

The problem likely is due to the fact that the Lambda is single threaded and we're not uploading in parallel.

finestructure commented 1 year ago

Thanks to @adam-fowler's help we're down to ~9.5m for the copy, with the total run time (arm64) being just over 10m.

finestructure commented 1 year ago

This is now live. The doc set limit is currently still 500MB for all packages except swift-syntax, for which it is 1GB.

That means currently only swift-syntax will opt-into the new doc uploader, because we only start using the doc uploader for doc sets > 500MB but actually reject all doc sets > 500MB except swift-syntax.

We'll lower the threshold over the coming days to onboard more packages into the new process.