crossplane-contrib / provider-upjet-aws

AWS Provider for Crossplane.
https://marketplace.upbound.io/providers/upbound/provider-family-aws/
Apache License 2.0
146 stars 123 forks source link

[Bug]: fatal concurrent map writes in SQS Queue #1487

Open jake-ciolek opened 1 month ago

jake-ciolek commented 1 month ago

Is there an existing issue for this?

Affected Resource(s)

sqs.aws.upbound.io - Queue

Resource MRs required to reproduce the bug

No response

Steps to Reproduce

Create queues with the provider, possibly define tags.

What happened?

We are running a version v0.47.1 of provider-aws-sqs. The SQS provider seems to exit due to a concurrent map write. The maps causing this seem to be the ones in the QueueObservation struct, see.

The queues have a tags section which might be the culprit:

    tags:
      Name: our_sqs_consuming_service
      cluster: devenv
      crossplane-kind: queue.sqs.aws.upbound.io
      crossplane-name: the-sqs-queue-gd4m2-fvfnm
      crossplane-providerconfig: upbound
      managedby: crossplane

It seems that these should be made safe for concurrent access. Perhaps Upjet should use sync.Map or it should use a locking mechanism for map typed fields. There seems to be a similar issue reported previously about ECS Tasks, but unfortunately got no attention so far.

Relevant Error Output Snippet

fatal error: concurrent map writes

goroutine 930300 [running]:
reflect.mapassign0(0xd8f2d40, 0x400b7546f0?, 0x4008847b38?, 0x1eb64?)
    runtime/map.go:1370 +0x24
reflect.mapassign(0x40190d8480?, 0x4062ec?, 0x4008847ba8?, 0x67754?)
    reflect/value.go:3828 +0xac
github.com/modern-go/reflect2.(*UnsafeMapType).UnsafeSetIndex(...)
    github.com/modern-go/reflect2@v1.0.2/unsafe_map.go:76
github.com/json-iterator/go.(*mapDecoder).Decode(0x400be99310, 0x400c805f08, 0x40190d8480)
    github.com/json-iterator/go@v1.1.12/reflect_map.go:191 +0x338
github.com/json-iterator/go.(*placeholderDecoder).Decode(0x4008847c38?, 0x40a9e8?, 0x4008847c38?)
    github.com/json-iterator/go@v1.1.12/reflect.go:324 +0x28
github.com/json-iterator/go.(*structFieldDecoder).Decode(0x400378d7c0, 0x40070dcba0?, 0x40190d8480)
    github.com/json-iterator/go@v1.1.12/reflect_struct_decoder.go:1054 +0x54
github.com/json-iterator/go.(*generalStructDecoder).decodeOneField(0x400378d920, 0x4008847d30?, 0x40190d8480)
    github.com/json-iterator/go@v1.1.12/reflect_struct_decoder.go:552 +0x27c
github.com/json-iterator/go.(*generalStructDecoder).Decode(0x400378d920, 0xc97c300?, 0x40190d8480)
    github.com/json-iterator/go@v1.1.12/reflect_struct_decoder.go:508 +0x98
github.com/json-iterator/go.(*Iterator).ReadVal(0x40190d8480, {0xde1a320, 0x400c805e78})
    github.com/json-iterator/go@v1.1.12/reflect.go:79 +0x120
github.com/json-iterator/go.(*frozenConfig).Unmarshal(0x40001db540, {0x400a003400?, 0x401523fa70?, 0xfbefc40?}, {0xde1a320, 0x400c805e78})
    github.com/json-iterator/go@v1.1.12/config.go:348 +0x78
github.com/upbound/provider-aws/apis/sqs/v1beta1.(*Queue).SetObservation(0x400c805c00, 0x400c12b650?)
    github.com/upbound/provider-aws/apis/sqs/v1beta1/zz_generated_terraformed.go:47 +0x98
github.com/crossplane/upjet/pkg/controller.(*noForkExternal).Update(0x400f33fe10, {0x11bf6628, 0x4003a229a0}, {0x11c76a60?, 0x400c805c00})
    github.com/crossplane/upjet@v1.1.0-rc.0.0.20231227120826-4cb45f9104ac/pkg/controller/external_nofork.go:673 +0x200
github.com/crossplane/upjet/pkg/controller.(*noForkAsyncExternal).Update.func1()
    github.com/crossplane/upjet@v1.1.0-rc.0.0.20231227120826-4cb45f9104ac/pkg/controller/external_async_nofork.go:161 +0x120
created by github.com/crossplane/upjet/pkg/controller.(*noForkAsyncExternal).Update in goroutine 3641
    github.com/crossplane/upjet@v1.1.0-rc.0.0.20231227120826-4cb45f9104ac/pkg/controller/external_async_nofork.go:157 +0x14c

Crossplane Version

v1.15.3

Provider Version

v0.47.1

Kubernetes Version

No response

Kubernetes Distribution

No response

Additional Info

No response

jake-ciolek commented 1 month ago

That particular environment has over 400 Queues created with the provider and this results in the provider restarting quite often, once every 5 to 20 minutes.