Arize-ai / phoenix

AI Observability & Evaluation
https://docs.arize.com/phoenix
Other
3.48k stars 257 forks source link

[BUG] inconsistently typed metadata prevents client methods from working #4420

Open antoni0z opened 3 weeks ago

antoni0z commented 3 weeks ago

Bug Report

Describe the bug

When attempting to retrieve traces or spans from a Phoenix server using the Python client, a 500 Internal Server Error is encountered. This occurs when executing client.get_trace_dataset() or client.get_spans_dataframe() in a notebook environment.

To Reproduce

Steps to reproduce the behavior:

  1. Set up a Phoenix server using Docker with version 4.31.0. Use the following Docker Compose configuration:

    phoenix:
     image: arizephoenix/phoenix:version-4.31.0
     ports:
       - 6006:6006
       - 4317:4317
     environment:
       - PHOENIX_WORKING_DIR=/mnt/data
     volumes:
       - phoenix_data:/mnt/data
     restart: always
     pull_policy: always
  2. Start the Docker container using docker-compose up.

  3. Install the Phoenix client with the following specification:

    arize-phoenix==0.4.31; python_version >= "3.11" and python_version < "3.12"
  4. In a Python notebook, execute the following code:

    import phoenix as px
    client = px.Client(endpoint="http://localhost:6006")
    client.get_trace_dataset()
    import phoenix as px
    client = px.Client(endpoint="http://localhost:6006")
    client.get_spans_dataframe()
  5. Observe the 500 Internal Server Error.

Expected behavior

The client.get_trace_dataset() call should successfully retrieve trace data from the Phoenix server without encountering an internal server error.

Screenshots

image

image

Environment

Additional context

  1. Server logs show the following error:

    INFO:     172.20.0.1:49718 - "POST /v1/spans?project_name=Cnt%20IA&project-name=Cnt%20IA HTTP/1.1" 500 Internal Server Error
    ERROR:    Exception in ASGI application
    pyarrow.lib.ArrowInvalid: ("Could not convert '21' with type str: tried to convert to int64", 'Conversion failed for column attributes.metadata with type object')

    This suggests a data type conversion issue when processing the trace data, specifically with the 'attributes.metadata' column.

  2. The error occurs in the following file: /phoenix/env/phoenix/server/api/routers/v1/spans.py, line 120

  3. The client also receives a warning: "The Phoenix server has an unknown version and may have warnings.warn("

  4. Server trace logging configuration:

    try:
       resource = Resource(attributes={
           ResourceAttributes.PROJECT_NAME: self._project_name
       })
       tracer_provider = trace_sdk.TracerProvider(resource=resource)
       phoenix_collector_endpoint = os.getenv("PHOENIX_COLLECTOR_ENDPOINT")
       if not phoenix_collector_endpoint:
           raise ValueError("PHOENIX_COLLECTOR_ENDPOINT environment variable is not set.")
       span_exporter = OTLPSpanExporter(endpoint=phoenix_collector_endpoint)
       span_processor = SimpleSpanProcessor(span_exporter=span_exporter)
       tracer_provider.add_span_processor(span_processor=span_processor)
       trace_api.set_tracer_provider(tracer_provider=tracer_provider)
       LangChainInstrumentor().instrument()
       print("Instrument successful")
    except Exception as e:
       print(f"An error has occurred while instrumenting: {e}")

    This configuration sets up the trace logging using OpenTelemetry and LangChain instrumentation.

  5. There's a FutureWarning in the server logs about pandas Series.getitem behavior:

    /phoenix/env/phoenix/trace/dsl/query.py:746: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`

    This warning suggests that the server code might be using an outdated pandas method, which could potentially lead to issues in future versions.

axiomofjoy commented 3 weeks ago

Thanks @antoni0z. This issue is caused by inconsistent types on metadata values. PyArrow is not able to handle.

antoni0z commented 3 weeks ago

As a quick fix for people who may experience the same error i managed to not get this error if I narrow it down using

filtered_query = SpanQuery("span_kind == 'LLM'").select(input = "input.value", output = "output.value", metadata = "metadata")

spans = client.query_spans(filtered_query)

This way if you exclude the problematic ones from the conversion to pyarrow it doesnt cause that error.