Closed ktenzer closed 2 years ago
Hi! Thanks for pointing this out. Maybe it's another regression introduced with recent features/fixes. Is there any error log in persistence jobs or in the operator ?
I reproduced on the latest build with the example cluster using postgres and elasticsearch. The frontend has the following logs:
{"level":"error","ts":"2022-09-28T17:12:00.580Z","msg":"unavailable error","service":"frontend","error":"unable to get mapping from Elasticsearch: elastic: Error 400 (Bad Request)","logging-call-at":"adminHandler.go:1792","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:142\ngo.temporal.io/server/service/frontend.(*AdminHandler).error\n\t/home/builder/temporal/service/frontend/adminHandler.go:1792\ngo.temporal.io/server/service/frontend.(*AdminHandler).GetSearchAttributes\n\t/home/builder/temporal/service/frontend/adminHandler.go:377\ngo.temporal.io/server/api/adminservice/v1._AdminService_GetSearchAttributes_Handler.func1\n\t/home/builder/temporal/api/adminservice/v1/service.pb.go:856\ngo.temporal.io/server/common/rpc/interceptor.(*SDKVersionInterceptor).Intercept\n\t/home/builder/temporal/common/rpc/interceptor/sdk_version.go:64\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1117\ngo.temporal.io/server/common/authorization.(*interceptor).Interceptor\n\t/home/builder/temporal/common/authorization/interceptor.go:152\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceCountLimitInterceptor).Intercept\n\t/home/builder/temporal/common/rpc/interceptor/namespace_count_limit.go:99\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceRateLimitInterceptor).Intercept\n\t/home/builder/temporal/common/rpc/interceptor/namespace_rate_limit.go:89\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc/interceptor.(*RateLimitInterceptor).Intercept\n\t/home/builder/temporal/common/rpc/interceptor/rate_limit.go:84\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceValidatorInterceptor).Intercept\n\t/home/builder/temporal/common/rpc/interceptor/namespace_validator.go:112\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/home/builder/temporal/common/rpc/interceptor/telemetry.go:135\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/metrics.NewServerMetricsContextInjectorInterceptor.func1\n\t/home/builder/temporal/common/metrics/grpc.go:66\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/home/builder/temporal/common/rpc/grpc.go:132\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceLogInterceptor).Intercept\n\t/home/builder/temporal/common/rpc/interceptor/namespace_logger.go:84\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1122\ngo.temporal.io/server/api/adminservice/v1._AdminService_GetSearchAttributes_Handler\n\t/home/builder/temporal/api/adminservice/v1/service.pb.go:858\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1283\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1620\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:922"}
{"level":"error","ts":"2022-09-28T17:12:00.580Z","msg":"unavailable error","operation":"GetSearchAttributes","error":"unable to get mapping from Elasticsearch: elastic: Error 400 (Bad Request)","logging-call-at":"telemetry.go:280","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:142\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).handleError\n\t/home/builder/temporal/common/rpc/interceptor/telemetry.go:280\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/home/builder/temporal/common/rpc/interceptor/telemetry.go:144\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/metrics.NewServerMetricsContextInjectorInterceptor.func1\n\t/home/builder/temporal/common/metrics/grpc.go:66\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/home/builder/temporal/common/rpc/grpc.go:132\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceLogInterceptor).Intercept\n\t/home/builder/temporal/common/rpc/interceptor/namespace_logger.go:84\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1122\ngo.temporal.io/server/api/adminservice/v1._AdminService_GetSearchAttributes_Handler\n\t/home/builder/temporal/api/adminservice/v1/service.pb.go:858\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1283\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1620\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:922"}
So it's here: https://github.com/temporalio/temporal/blob/master/service/frontend/adminHandler.go#L390
But when getting the mapping using curl it works well:
curl opensearch-cluster-master.demo:9200/temporal_visibility_v1_dev/_mapping
{"temporal_visibility_v1_dev":{"mappings":{"dynamic":"false","properties":{"BatcherNamespace":{"type":"keyword"},"BatcherUser":{"type":"keyword"},"BinaryChecksums":{"type":"keyword"},"CloseTime":{"type":"date_nanos"},"ExecutionDuration":{"type":"long"},"ExecutionStatus":{"type":"keyword"},"ExecutionTime":{"type":"date_nanos"},"HistoryLength":{"type":"long"},"NamespaceId":{"type":"keyword"},"RunId":{"type":"keyword"},"StartTime":{"type":"date_nanos"},"StateTransitionCount":{"type":"long"},"TaskQueue":{"type":"keyword"},"TemporalChangeVersion":{"type":"keyword"},"TemporalSchedulePaused":{"type":"boolean"},"TemporalScheduledById":{"type":"keyword"},"TemporalScheduledStartTime":{"type":"date_nanos"},"WorkflowId":{"type":"keyword"},"WorkflowType":{"type":"keyword"}}}}}
That's so strange ...
I have found a difference in our config vs what gets deployed via helm. If you curl the elastic index curl http://opensearch-cluster-master.temporal:9200/temporal_visibility_v1_dev
you will see slight difference. In settings of the working version there is a different structure. I believe this is the problem and it correlates to a 400 error which is essentially malformed data (marshalling issue).
Ours
"settings":{"index":{"search":{"idle":{"after":"365d"}},"number_of_shards":"1","auto_expand_replicas":"0-2","provided_name":"temporal_visibility_v1_dev","creation_date":"1664411533873","sort":{"field":["CloseTime","StartTime","RunId"],"missing":["_first","_first","_first"],"order":["desc","desc","desc"]},"number_of_replicas":"0","uuid":"tvfDzJ_WRL616jp0JGCFew","version":{"created":"136217827"}}}}}
Working from helm
"settings":{"index":{"routing":{"allocation":{"include":{"_tier_preference":"data_content"}}},"search":{"idle":{"after":"365d"}},"number_of_shards":"1","auto_expand_replicas":"0-2","provided_name":"temporal_visibility_v1_dev","creation_date":"1664412928745","sort":{"field":["CloseTime","StartTime","RunId"],"missing":["_first","_first","_first"],"order":["desc","desc","desc"]},"number_of_replicas":"1","uuid":"1CC1UirtR02kpzofc5_4XA","version":{"created":"7160299"}}}}}
I looked at what their job pod does to setup index and this is it which also differs from what we do.
curl -X PUT --fail --user : http://elasticsearch-master-headless:9200/_template/temporal_visibility_v1_template -H "Content-Type: application/json" --data-binary "@schema/elasticsearch/visibility/index_template_v7.json" 2>&1 && curl -X PUT --fail --user : http://elasticsearch-master-headless:9200/temporal_visibility_v1_dev 2>&1'
Ok I figured it out, that was a PITA :D
The issue is opensearch for whatever reason the index generated differs enough to where temporal cant get the mappings. Using the below image works without changes to operator.
docker.elastic.co/elasticsearch/elasticsearch:7.16.2
I added to pull request commit above which switches your sample that doesnt work with opensearch for one that does with elasticsearch and image above.
Fine! Thanks for the fix! I think that we can safely close this issue.
After deploying elasticsearch and connecting to admintoold pod I run following command:
$ tctl adm cluster gsa
This should return the default elastic search attributes but instead throws bad request (400).