Describe the bug
Background: we use Athens with Redis (as our SingleFlightMode) to manage concurrent write access to our storage (S3).
We have encountered an issue in our Athens setup where requests for certain modules and versions (e.g. github.com/aws/aws-sdk-go/@v/v1.12.42.info) has resulted in a 500 status code and a "redislock: not obtained" error in our logs.
Error Message
When users see 500 Internal Server Error. Our Athens log:
redislock: not obtained
When users see 404 Not Found. Our Athens log (expected):
github.com/aws/aws-sdk-go@v1.12.42: verifying go.mod: github.com/aws/aws-sdk-go@v1.12.42/go.mod: reading https://sum.golang.org/lookup/github.com/aws/aws-sdk-go@v1.12.42: 404 Not Found
server response:
not found: create zip: models/apis/sagemaker/2017-07-24/api-2.json: case-insensitive file name collision: "models/apis/SageMaker/2017-07-24/api-2.json" and "models/apis/sagemaker/2017-07-24/api-2.json"
models/apis/sagemaker/2017-07-24/docs-2.json: case-insensitive file name collision: "models/apis/SageMaker/2017-07-24/docs-2.json" and "models/apis/sagemaker/2017-07-24/docs-2.json"
To Reproduce
Steps to reproduce the behavior:
Clone this repository
Start a Redis locally: docker run -d --name some-redis -p 6379:6379 redis
To reproduce, rapidly send few requests to time curl http://localhost:3000/github.com/aws/aws-sdk-go/@v/v1.12.42.info. This took me ~30s on average. Here, you should see redislock: not obtained getting logged
Expected behavior
The expected behavior here is for Athens to return a 404 status code as this module does not exist. However, Athens is returning a 500 status code (with a Redis lock error logged on the server-side).
/github.com/aws/aws-sdk-go/@v/v1.12.42 is part of our transitive dependencies so there isn't a way to avoid it
/github.com/aws/aws-sdk-go/@v/v1.12.42 is just an example, we've seen similar cases with other Go modules like: /cloud.google.com/go/storage/@v/v1.35.0.info, /github.com/go-redis/redis/@v/v6.15.3+incompatible.info, /gopkg.in/go-playground/validator.v9/@v/v9.30.1.info (... and more). Generally they seemed to be 'problematic' modules to run go mod download -json on
Further investigation
The root cause appears to be related to Athens obtaining the Redis lock before fetching from the upstream source (code) at module.downloadModule. In this specific case, fetching from the upstream took a significant amount of time, and obtaining the Redis lock prematurely has lead to lock contention and timeouts.
Upon further investigation, it was discovered that the bottleneck occurred within the goGetFetcher.Fetch function, which calls module.downloadModule (reference)
The module.downloadModule function is responsible for executing the go mod download -json github.com/aws/aws-sdk-go@v1.12.42. Here, the go mod download -json command took on average about 30 seconds to complete
Proposal
I propose that instead of obtaining the Redis lock before fetching from the upstream source, the lock should be acquired only after successfully fetching the module from upstream and before stashing it to the storage backend
This approach ensures that the lock is held for a minimal duration (reduces the chances of lock contention and timeouts)
Describe the bug Background: we use Athens with Redis (as our
SingleFlightMode
) to manage concurrent write access to our storage (S3).We have encountered an issue in our Athens setup where requests for certain modules and versions (e.g.
github.com/aws/aws-sdk-go/@v/v1.12.42.info
) has resulted in a 500 status code and a"redislock: not obtained"
error in our logs.Error Message
When users see 500 Internal Server Error. Our Athens log:
When users see 404 Not Found. Our Athens log (expected):
To Reproduce Steps to reproduce the behavior:
docker run -d --name some-redis -p 6379:6379 redis
SingleFlightType = "redis"
make run
time curl http://localhost:3000/github.com/aws/aws-sdk-go/@v/v1.12.42.info
. This took me ~30s on average. Here, you should seeredislock: not obtained
getting loggedExpected behavior The expected behavior here is for Athens to return a 404 status code as this module does not exist. However, Athens is returning a 500 status code (with a Redis lock error logged on the server-side).
Environment
go1.22.2 darwin/arm64
v0.14.1
Additional context
/github.com/aws/aws-sdk-go/@v/v1.12.42
is part of our transitive dependencies so there isn't a way to avoid it/github.com/aws/aws-sdk-go/@v/v1.12.42
is just an example, we've seen similar cases with other Go modules like:/cloud.google.com/go/storage/@v/v1.35.0.info
,/github.com/go-redis/redis/@v/v6.15.3+incompatible.info
,/gopkg.in/go-playground/validator.v9/@v/v9.30.1.info
(... and more). Generally they seemed to be 'problematic' modules to rungo mod download -json
onFurther investigation
The root cause appears to be related to Athens obtaining the Redis lock before fetching from the upstream source (code) at
module.downloadModule
. In this specific case, fetching from the upstream took a significant amount of time, and obtaining the Redis lock prematurely has lead to lock contention and timeouts.Upon further investigation, it was discovered that the bottleneck occurred within the
goGetFetcher.Fetch
function, which callsmodule.downloadModule
(reference)The
module.downloadModule
function is responsible for executing thego mod download -json github.com/aws/aws-sdk-go@v1.12.42
. Here, thego mod download -json
command took on average about 30 seconds to completeProposal