datastax / astrapy

AstraPy is a Pythonic interface for DataStax Astra DB and the Data API
https://github.com/datastax/astrapy
Apache License 2.0
17 stars 18 forks source link

ModuleNotFoundError: No module named 'bson.objectid' #284

Closed DiogoR23 closed 5 days ago

DiogoR23 commented 3 weeks ago

Hi

I am trying to load the AstraDBVectorStore library but it appears me this error.

This is the problem I am trying to solve:

from dotenv import load_dotenv
import os

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_astradb import AstraDBVectorStore
from langchain.agents import create_tool_calling_agent
from langchain.agents import AgentExecutor
from langchain.tools.retriever import create_retriever_tool
from langchain import hub
from github import fetch_github_issues
from note import note_tool
from langchain_core.messages import HumanMessage

def connect_to_vstore():
    embeddings = OpenAIEmbeddings()
    ASTRA_DB_API_ENDPOINT = os.getenv("ASTRA_DB_API_ENDPOINT")
    ASTRA_DB_APPLICATION_TOKEN = os.getenv("ASTRA_DB_APPLICATION_TOKEN")
    desired_namespace = os.getenv("ASTRA_DB_KEYSPACE")

    if desired_namespace:
        ASTRA_DB_KEYSPACE = desired_namespace
    else:
        ASTRA_DB_KEYSPACE = None

    vstore = AstraDBVectorStore(
        embedding=embeddings,
        collection_name="github",
        api_endpoint=ASTRA_DB_API_ENDPOINT,
        token=ASTRA_DB_APPLICATION_TOKEN,
        namespace=ASTRA_DB_KEYSPACE,
    )
    return vstore

Poetry show:

aiohttp                   3.9.5           Async http client/server framework (asyncio)
aiosignal                 1.3.1           aiosignal: a list of registered asynchronous callbacks
annotated-types           0.7.0           Reusable constraint types to use with typing.Annotated
anyio                     4.4.0           High level compatibility layer for multiple asynchronous event loop imple...
astrapy                   1.2.1           AstraPy is a Pythonic SDK for DataStax Astra and its Data API
async-timeout             4.0.3           Timeout context manager for asyncio programs
attrs                     23.2.0          Classes Without Boilerplate
bson                      0.5.10          BSON codec for Python
cassandra-driver          3.29.1          DataStax Driver for Apache Cassandra
cassio                    0.1.8           A framework-agnostic Python library to seamlessly integrate Apache Cassan...
certifi                   2024.6.2        Python package for providing Mozilla's CA Bundle.
charset-normalizer        3.3.2           The Real First Universal Charset Detector. Open, modern and actively main...
click                     8.1.7           Composable command line interface toolkit
dataclasses-json          0.6.7           Easily serialize dataclasses to and from JSON.
deprecation               2.1.0           A library to handle automated deprecations
distro                    1.9.0           Distro - an OS platform information API
exceptiongroup            1.2.1           Backport of PEP 654 (exception groups)
frozenlist                1.4.1           A list-like structure which implements collections.abc.MutableSequence
geomet                    0.2.1.post1     GeoJSON <-> WKT/WKB conversion utilities
greenlet                  3.0.3           Lightweight in-process concurrent programming
h11                       0.14.0          A pure-Python, bring-your-own-I/O implementation of HTTP/1.1
h2                        4.1.0           HTTP/2 State-Machine based protocol implementation
hpack                     4.0.0           Pure-Python HPACK header compression
httpcore                  1.0.5           A minimal low-level HTTP client.
httpx                     0.27.0          The next generation HTTP client.
hyperframe                6.0.1           HTTP/2 framing layer for Python
idna                      3.7             Internationalized Domain Names in Applications (IDNA)
joblib                    1.4.2           Lightweight pipelining with Python functions
jsonpatch                 1.33            Apply JSON-Patches (RFC 6902)
jsonpointer               2.4             Identify specific nodes in a JSON document (RFC 6901)
jsonschema                4.22.0          An implementation of JSON Schema validation for Python
jsonschema-specifications 2023.12.1       The JSON Schema meta-schemas and vocabularies, exposed as a Registry
langchain                 0.2.5           Building applications with LLMs through composability
langchain-astradb         0.3.3           An integration package connecting Astra DB and LangChain
langchain-community       0.2.5           Community contributed LangChain integrations.
langchain-core            0.2.8           Building applications with LLMs through composability
langchain-experimental    0.0.61          Building applications with LLMs through composability
langchain-openai          0.1.8           An integration package connecting OpenAI and LangChain
langchain-text-splitters  0.2.1           LangChain text splitting utilities
langchainhub              0.1.20          The LangChain Hub API client
langsmith                 0.1.79          Client library to connect to the LangSmith LLM Tracing and Evaluation Pla...
marshmallow               3.21.3          A lightweight library for converting complex datatypes to and from native...
multidict                 6.0.5           multidict implementation
mypy-extensions           1.0.0           Type system extensions for programs checked with the mypy type checker.
nltk                      3.8.1           Natural Language Toolkit
numpy                     1.26.4          Fundamental package for array computing in Python
openai                    1.34.0          The official Python library for the openai API
orjson                    3.10.5          Fast, correct Python JSON library supporting dataclasses, datetimes, and ...
packaging                 24.1            Core utilities for Python packages
pydantic                  2.7.4           Data validation using Python type hints
pydantic-core             2.18.4          Core functionality for Pydantic validation and serialization
pyproject-toml            0.0.11          Project intend to implement PEP 517, 518, 621, 631 and so on.
python-dateutil           2.9.0.post0     Extensions to the standard Python datetime module
python-dotenv             1.0.1           Read key-value pairs from a .env file and set them as environment variables
pyyaml                    6.0.1           YAML parser and emitter for Python
referencing               0.35.1          JSON Referencing + Python
regex                     2024.5.15       Alternative regular expression module, to replace re.
requests                  2.32.3          Python HTTP for Humans.
rpds-py                   0.18.1          Python bindings to Rust's persistent data structures (rpds)
six                       1.16.0          Python 2 and 3 compatibility utilities
sniffio                   1.3.1           Sniff out which async library your code is running under
sqlalchemy                2.0.30          Database Abstraction Library
tenacity                  8.4.1           Retry code until it succeeds
tiktoken                  0.7.0           tiktoken is a fast BPE tokeniser for use with OpenAI's models
toml                      0.10.2          Python Library for Tom's Obvious, Minimal Language
tqdm                      4.66.4          Fast, Extensible Progress Meter
types-requests            2.32.0.20240602 Typing stubs for requests
typing-extensions         4.12.2          Backported and Experimental Type Hints for Python 3.8+
typing-inspect            0.9.0           Runtime inspection utilities for typing module.
urllib3                   2.2.2           HTTP library with thread-safe connection pooling, file post, and more.
uuid6                     2024.1.12       New time-based UUID formats which are suited for use as a database key
yarl                      1.9.4           Yet another URL library
hemidactylus commented 2 weeks ago

Hello, sorry the late reply. The error you mention generally comes e.g. on a Python runtime that has no bson package at all (this is how I reproduced it right now).

But your poetry show gives bson==0.5.10, the very same on which I could run the troublesome import line right now: poetry run python -c "from bson.objectid import ObjectId" seems to work fine indeed.

Is it possible perhaps that the LangChain script you are running is started with a different command, that sets a different environment (e.g. one without bson package) ?