Troubleshooting and debugging has been a long pain for our system (e.g., #2311, #3190, #288, #725 )
Currently there are no unified and effective methods or mechanisms for logging and error reporting across the components in GraphScope (Flex). Maybe the CNCF project OpenTelemetry provides a viable solution for us:
it supports many signals: including logs/metrics/trace. The trace seems very suitable in our situation, it carries info through many components in a complex system along with a request(e.g., a Cypher/Gremlin query)
rich instrumentation support, covering the languages GraphScope used.
less instrumentation efforts, some SDKs even support automatic instrumentation.
The signals are rich enough for monitoring the system status and are compatible/integratable to visualization systems.
Troubleshooting and debugging has been a long pain for our system (e.g., #2311, #3190, #288, #725 ) Currently there are no unified and effective methods or mechanisms for logging and error reporting across the components in GraphScope (Flex). Maybe the CNCF project OpenTelemetry provides a viable solution for us: