Closed cockroach-teamcity closed 1 year ago
cc @cockroachdb/replication
This comes from IngestExternalFilesWithStats
:
I assume internally it's trying to hard-link the SST into the LSM, and is finding that an SST of that seqno already exists.
This shouldn't have anything to do with the SSTs we passed in, as SSTs are usually assigned a counter?
Note that this is darwin, so it's not high on our priority list, though we'd want to make sure there isn't a general bug in IngestExternalFilesWithStats
.
Handing this over to the storage team in case they want to look into it more before closing out.
There were two events for different nodes (n2 and n3) of the same cluster within 1 second of one another.
I double-checked the code; the logic here is pretty simple. We increment a counter while holding a mutex to obtain unique file numbers. During Open, we ratchet the next file number up above beyond the largest file number in the directory, so even a stray sstable that's not part of the LSM cannot lead to this error.
The only pathway to this error that I can see is adding a sstable to the directory while the engine is already open. The user could've manually copied a higher-numbered sstable into the directory, or two cockroach processes could be conflicting, sharing each others tables. The latter seems unlikely because we use a file lock to prevent it that the user would need to remove, although it might explain the double failure across two nodes.
Going to close this out under the assumption that this was an operator doing something silly (like forcing two processes to share a store directory). We can re-examine if there's ever an additional report.
This issue was autofiled by Sentry. It represents a crash or reported error on a live cluster with telemetry enabled.
Sentry link: https://cockroach-labs.sentry.io/issues/3946779534/?referrer=webhooks_plugin
Panic message:
Stacktrace (expand for inline code snippets):
https://github.com/cockroachdb/cockroach/blob/0c6903954dc9cd6c38c78ce5192cfd9a8183c110/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go#L465-L467 in pkg/kv/kvserver.(*Store).processRaftSnapshotRequest.func1 https://github.com/cockroachdb/cockroach/blob/0c6903954dc9cd6c38c78ce5192cfd9a8183c110/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go#L343-L345 in pkg/kv/kvserver.(*Store).withReplicaForRequest https://github.com/cockroachdb/cockroach/blob/0c6903954dc9cd6c38c78ce5192cfd9a8183c110/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go#L402-L404 in pkg/kv/kvserver.(*Store).processRaftSnapshotRequest https://github.com/cockroachdb/cockroach/blob/0c6903954dc9cd6c38c78ce5192cfd9a8183c110/pkg/kv/kvserver/pkg/kv/kvserver/store_snapshot.go#L1080-L1082 in pkg/kv/kvserver.(*Store).receiveSnapshot https://github.com/cockroachdb/cockroach/blob/0c6903954dc9cd6c38c78ce5192cfd9a8183c110/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go#L212-L214 in pkg/kv/kvserver.(*Store).HandleSnapshot.func1 https://github.com/cockroachdb/cockroach/blob/0c6903954dc9cd6c38c78ce5192cfd9a8183c110/pkg/util/stop/stopper.go#L340-L342 in pkg/util/stop.(*Stopper).RunTaskWithErr https://github.com/cockroachdb/cockroach/blob/0c6903954dc9cd6c38c78ce5192cfd9a8183c110/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go#L209-L211 in pkg/kv/kvserver.(*Store).HandleSnapshot https://github.com/cockroachdb/cockroach/blob/0c6903954dc9cd6c38c78ce5192cfd9a8183c110/pkg/kv/kvserver/pkg/kv/kvserver/raft_transport.go#L378-L380 in pkg/kv/kvserver.(*RaftTransport).RaftSnapshot https://github.com/cockroachdb/cockroach/blob/0c6903954dc9cd6c38c78ce5192cfd9a8183c110/pkg/kv/kvserver/storage_services.pb.go#L269-L271 in pkg/kv/kvserver._MultiRaft_RaftSnapshot_Handler https://github.com/cockroachdb/cockroach/blob/0c6903954dc9cd6c38c78ce5192cfd9a8183c110/pkg/util/tracing/grpcinterceptor/grpc_interceptor.go#L162-L164 in pkg/util/tracing/grpcinterceptor.StreamServerInterceptor.func1 google.golang.org/grpc/external/org_golang_google_grpc/server.go#L1407-L1409 in google.golang.org/grpc.chainStreamInterceptors.func1.1 https://github.com/cockroachdb/cockroach/blob/0c6903954dc9cd6c38c78ce5192cfd9a8183c110/pkg/rpc/pkg/rpc/context.go#L271-L273 in pkg/rpc.NewServerEx.func4 google.golang.org/grpc/external/org_golang_google_grpc/server.go#L1410-L1412 in google.golang.org/grpc.chainStreamInterceptors.func1.1 https://github.com/cockroachdb/cockroach/blob/0c6903954dc9cd6c38c78ce5192cfd9a8183c110/pkg/rpc/pkg/rpc/context.go#L240-L242 in pkg/rpc.NewServerEx.func2.1 https://github.com/cockroachdb/cockroach/blob/0c6903954dc9cd6c38c78ce5192cfd9a8183c110/pkg/util/stop/stopper.go#L340-L342 in pkg/util/stop.(*Stopper).RunTaskWithErr https://github.com/cockroachdb/cockroach/blob/0c6903954dc9cd6c38c78ce5192cfd9a8183c110/pkg/rpc/pkg/rpc/context.go#L239-L241 in pkg/rpc.NewServerEx.func2 google.golang.org/grpc/external/org_golang_google_grpc/server.go#L1410-L1412 in google.golang.org/grpc.chainStreamInterceptors.func1.1 google.golang.org/grpc/external/org_golang_google_grpc/server.go#L1412-L1414 in google.golang.org/grpc.chainStreamInterceptors.func1 google.golang.org/grpc/external/org_golang_google_grpc/server.go#L1548-L1550 in google.golang.org/grpc.(*Server).processStreamingRPC google.golang.org/grpc/external/org_golang_google_grpc/server.go#L1623-L1625 in google.golang.org/grpc.(*Server).handleStream google.golang.org/grpc/external/org_golang_google_grpc/server.go#L921-L923 in google.golang.org/grpc.(*Server).serveStreams.func1.2 src/runtime/asm_amd64.s#L1593-L1595 in runtime.goexitv22.2.5
Jira issue: CRDB-24646