Open Nabeegh-Ahmed opened 4 weeks ago
@Nabeegh-Ahmed totally understand your needs. I think the thing I worry about here is that your way of storing traces outside of phoenix is not our intended design so I'm a bit scared of code that breaks that assumption a bit. Is it possible for you to just use %Y-%m-%d %H:%M:%S.%f
when ingesting into phoenixes for now?
@mikeldking I have not really made changes to how openinference
works with dates. I am using _decode_unix_nano
and just storing the result obtained from it. I can currently make do with using the %Y-%m-%d %H:%M:%S.%f
. But just thinking about, since I didn't update the time handling logic, this error should not have occurred.
@Nabeegh-Ahmed Can you send a snippet for reproducing step 1?
Fetch spans from database and convert them to a list
@axiomofjoy I have a mongodb database where I am storing traces. I am doing it exactly like phoenix does it (otel.py
). Here's how I do step 1:
traces = (
get_mongo_db()
.get_collection("spans")
.find(
{
"start_time": {"$gte": start_time},
"end_time": {"$lte": end_time},
}
| ({"project_name": project_name} if project_name is not None else {})
)
)
traces = list(traces)
Hey @Nabeegh-Ahmed, I am not able to reproduce this issue on latest main
. Can you upgrade your version of Phoenix with pip install -U "arize-phoenix==4.5.0"
to the latest version and try out the following script? (this script assumes you have trace data in Phoenix)
from phoenix.trace.trace_dataset import TraceDataset
from phoenix.session.client import Client
client = Client()
spans_df = client.query_spans()
# the script works for me regardless of whether the next two lines are commented out
spans_df.start_time = spans_df.start_time.map(
lambda dt: dt.strftime("%Y-%m-%d %H:%M:%S")
)
spans_df.end_time = spans_df.end_time.map(lambda dt: dt.strftime("%Y-%m-%d %H:%M:%S"))
print(f"{spans_df.start_time.map(type).unique()=}")
print(f"{spans_df.end_time.map(type).unique()=}")
print(f"{spans_df.start_time=}")
print(f"{spans_df.end_time=}")
trace_ds = TraceDataset(spans_df)
print(f"{trace_ds=}")
The script also appears to work for me when using ISO 8601 formatted timestamps.
from phoenix.trace.trace_dataset import TraceDataset
from phoenix.session.client import Client
client = Client()
spans_df = client.query_spans()
spans_df.start_time = spans_df.start_time.map(lambda dt: dt.isoformat())
spans_df.end_time = spans_df.end_time.map(lambda dt: dt.isoformat())
print(f"{spans_df.start_time.map(type).unique()=}")
print(f"{spans_df.end_time.map(type).unique()=}")
print(f"{spans_df.start_time=}")
print(f"{spans_df.end_time=}")
trace_ds = TraceDataset(spans_df)
print(f"{trace_ds=}")
Describe the bug When constructing a
TraceDataset
it creates an error where thestart_time
normalization fails with an error:The time is stored as is using OTEL with no changes. I have tracked the error trace and it is produced in
normalize_timestamps
insidedatetime_utils.py:52
. The function uses theto_datetime
utility in pandas. Essentially arising when the passed datetime is an object rather than a string or timestamp.To Reproduce Steps to reproduce the behavior:
TraceDataset
as follows:Inside of
process_span
looks likeExpected behavior The dataset should be constructed without any errors.
Screenshots![image](https://github.com/Arize-ai/phoenix/assets/29403249/38b8ac37-30d7-47ab-bc0c-11b7c4080ddc)
Even though the screenshot above indicates the date as:
2024-06-21 18:23:34
, the date passed to it is this:2024-06-21T18:23:34.801+00:00
. Indicating a failure in date conversions.Environment (please complete the following information):
Additional context I was able to find an easy fix for this issue. I am opening a PR for it. Keeping this issue for discussion and tracking.