Open P1et1e opened 2 months ago
Thanks @P1et1e. Can you send a snippet to reproduce the issue?
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
from phoenix.trace import using_project
from openinference.instrumentation.llama_index import LlamaIndexInstrumentor
from openinference.instrumentation.langchain import LangChainInstrumentor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
tracer_provider = trace_sdk.TracerProvider()
tracer_provider.add_span_processor(
SimpleSpanProcessor(OTLPSpanExporter("http://127.0.0.1:6006/v1/traces"))
)
LlamaIndexInstrumentor().instrument(
skip_dep_check=True,
tracer_provider=tracer_provider,
)
ada_002_embeddings = AzureOpenAIEmbedding()
required_exts = [".pdf"]
documents = SimpleDirectoryReader(
input_dir="/dir/path/",
required_exts=required_exts,
recursive=True,
).load_data()
with using_project("indexing"):
index = VectorStoreIndex.from_documents(
documents, embed_model=ada_002_embeddings
)
@axiomofjoy And I am running a arize/phoenix docker container.
When I have a PDF Document with around 1400 Pages (Slides of a presentation), I have default one Document/Chunk per page. In the trace in the Phoenix UI I can see One Trace of kind embedding with name BaseEmbedding.get_text_embedding_batch under this trace only the Embeddings for the last batch of 10 are shown. When I click on Attributes i can see a json where under the key input there is a field value that contains an object with key texts where there is a list of all text chunks. I expected to see all embeddings there. Maybe even get a Span for each batch of embeddings.
Thank you for the details @P1et1e !
@axiomofjoy is there any update or anything i can assist with ?
Hey @P1et1e, this issue is not scheduled and I'm not sure when we will get to it as the team is currently heads down implementing authentication. Contributions are welcome!
Describe the bug UI only shows request results of the last call of get_text_embbeddings_batch. So only the embeddings of the last 10 chunks are traced.
To Reproduce Using LLamaIndex SimpleDirectoryReader to read in a PDF file and generate a VectorStoreIndex.from_documents() using AzureOpenAIEmbeddings.
Expected behavior I expected to have multiple Traces under the span BaseEmbeddings one for each call of get_text_embeddings_batch and not only for the last call. Want to have a proper observability by tracing all calls to the embedding model API and also want to be able to use all embeddings for Inference.
Screenshots If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):