aws / aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
https://aws-sdk-pandas.readthedocs.io
Apache License 2.0
3.93k stars 700 forks source link

No module named 'jsonpath_ng' when importing awswrangler #2938

Closed RLashofRegas closed 2 months ago

RLashofRegas commented 2 months ago

Describe the bug

I am on an M1 mac. When I install any 3.x version of awswrangler and then import awswrangler as wr it fails with the error:

File <REDACTED>/.venv/lib/python3.11/site-packages/awswrangler/__init__.py:10
      1 """Initial Module.
      2 
      3 Source repository: https://github.com/aws/aws-sdk-pandas
      4 Documentation: https://aws-sdk-pandas.readthedocs.io/
      5 
      6 """
      8 import logging as _logging
---> 10 from awswrangler import (
     11     athena,
     12     catalog,
     13     chime,
     14     cleanrooms,
     15     cloudwatch,
     16     data_api,
     17     data_quality,
     18     dynamodb,
     19     emr,
     20     emr_serverless,
     21     exceptions,
     22     mysql,
     23     neptune,
     24     opensearch,
     25     oracle,
     26     postgresql,
     27     quicksight,
     28     redshift,
     29     s3,
     30     secretsmanager,
     31     sqlserver,
     32     sts,
     33     timestream,
     34     typing,
     35 )
     36 from awswrangler.__metadata__ import __description__, __license__, __title__, __version__
     37 from awswrangler._config import config

File <REDACTED>/.venv/lib/python3.11/site-packages/awswrangler/opensearch/__init__.py:5
      3 from awswrangler.opensearch._read import search, search_by_sql
      4 from awswrangler.opensearch._utils import connect, create_collection
----> 5 from awswrangler.opensearch._write import create_index, delete_index, index_csv, index_df, index_documents, index_json
      7 __all__ = [
      8     "connect",
      9     "create_collection",
   (...)
     17     "search_by_sql",
     18 ]

File <REDACTED>/.venv/lib/python3.11/site-packages/awswrangler/opensearch/_write.py:23
     21 opensearchpy = _utils.import_optional_dependency("opensearchpy")
     22 if opensearchpy:
---> 23     from jsonpath_ng import parse
     24     from jsonpath_ng.exceptions import JsonPathParserError
     26 _logger: logging.Logger = logging.getLogger(__name__)

ModuleNotFoundError: No module named 'jsonpath_ng'

How to Reproduce

If I install 2.20.1 it works. Here's the output of pip install awswrangler==2.20.1:

Collecting awswrangler==2.20.1
  Downloading awswrangler-2.20.1-py3-none-any.whl.metadata (20 kB)
Collecting backoff<3.0.0,>=1.11.1 (from awswrangler==2.20.1)
  Downloading backoff-2.2.1-py3-none-any.whl.metadata (14 kB)
Requirement already satisfied: boto3<2.0.0,>=1.24.11 in <REDACTED>/.venv/lib/python3.11/site-packages (from awswrangler==2.20.1) (1.34.25)
Requirement already satisfied: botocore<2.0.0,>=1.27.11 in <REDACTED>/.venv/lib/python3.11/site-packages (from awswrangler==2.20.1) (1.34.25)
Collecting gremlinpython<4.0.0,>=3.5.2 (from awswrangler==2.20.1)
  Downloading gremlinpython-3.7.2-py2.py3-none-any.whl.metadata (6.4 kB)
Collecting jsonpath-ng<2.0.0,>=1.5.3 (from awswrangler==2.20.1)
  Downloading jsonpath_ng-1.6.1-py3-none-any.whl.metadata (18 kB)
Requirement already satisfied: numpy<2.0.0,>=1.23.5 in <REDACTED>/.venv/lib/python3.11/site-packages (from awswrangler==2.20.1) (1.26.4)
Collecting openpyxl<3.1.0,>=3.0.0 (from awswrangler==2.20.1)
  Downloading openpyxl-3.0.10-py2.py3-none-any.whl.metadata (2.4 kB)
Requirement already satisfied: opensearch-py<3,>=1 in <REDACTED>/.venv/lib/python3.11/site-packages (from awswrangler==2.20.1) (2.6.0)
Collecting pandas!=1.5.0,<2.0.0,<=1.5.1,>=1.2.0 (from awswrangler==2.20.1)
  Downloading pandas-1.5.1-cp311-cp311-macosx_11_0_arm64.whl.metadata (11 kB)
Collecting pg8000<2.0.0,>=1.20.0 (from awswrangler==2.20.1)
  Downloading pg8000-1.31.2-py3-none-any.whl.metadata (74 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 74.7/74.7 kB 3.5 MB/s eta 0:00:00
Collecting progressbar2<5.0.0,>=4.0.0 (from awswrangler==2.20.1)
  Downloading progressbar2-4.4.2-py3-none-any.whl.metadata (17 kB)
Collecting pyarrow<10.1.0,>=2.0.0 (from awswrangler==2.20.1)
  Downloading pyarrow-10.0.1-cp311-cp311-macosx_11_0_arm64.whl.metadata (3.0 kB)
Collecting pymysql<2.0.0,>=1.0.0 (from awswrangler==2.20.1)
  Downloading PyMySQL-1.1.1-py3-none-any.whl.metadata (4.4 kB)
Collecting redshift-connector<2.1.0,>=2.0.889 (from awswrangler==2.20.1)
  Downloading redshift_connector-2.0.918-py3-none-any.whl.metadata (66 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.2/66.2 kB 6.7 MB/s eta 0:00:00
Collecting requests-aws4auth<2.0.0,>=1.1.1 (from awswrangler==2.20.1)
  Downloading requests_aws4auth-1.3.1-py3-none-any.whl.metadata (18 kB)
Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in <REDACTED>/.venv/lib/python3.11/site-packages (from boto3<2.0.0,>=1.24.11->awswrangler==2.20.1) (1.0.1)
Requirement already satisfied: s3transfer<0.11.0,>=0.10.0 in <REDACTED>/.venv/lib/python3.11/site-packages (from boto3<2.0.0,>=1.24.11->awswrangler==2.20.1) (0.10.2)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in <REDACTED>/.venv/lib/python3.11/site-packages (from botocore<2.0.0,>=1.27.11->awswrangler==2.20.1) (2.9.0.post0)
Requirement already satisfied: urllib3<2.1,>=1.25.4 in <REDACTED>/.venv/lib/python3.11/site-packages (from botocore<2.0.0,>=1.27.11->awswrangler==2.20.1) (1.26.19)
Requirement already satisfied: nest-asyncio in <REDACTED>/.venv/lib/python3.11/site-packages (from gremlinpython<4.0.0,>=3.5.2->awswrangler==2.20.1) (1.6.0)
Requirement already satisfied: aiohttp<4.0.0,>=3.8.0 in <REDACTED>/.venv/lib/python3.11/site-packages (from gremlinpython<4.0.0,>=3.5.2->awswrangler==2.20.1) (3.9.5)
Collecting aenum<4.0.0,>=1.4.5 (from gremlinpython<4.0.0,>=3.5.2->awswrangler==2.20.1)
  Downloading aenum-3.1.15-py3-none-any.whl.metadata (3.7 kB)
Requirement already satisfied: six<2.0.0,>=1.10.0 in <REDACTED>/.venv/lib/python3.11/site-packages (from gremlinpython<4.0.0,>=3.5.2->awswrangler==2.20.1) (1.16.0)
Requirement already satisfied: isodate<1.0.0,>=0.6.0 in <REDACTED>/.venv/lib/python3.11/site-packages (from gremlinpython<4.0.0,>=3.5.2->awswrangler==2.20.1) (0.6.1)
Requirement already satisfied: ply in <REDACTED>/.venv/lib/python3.11/site-packages (from jsonpath-ng<2.0.0,>=1.5.3->awswrangler==2.20.1) (3.11)
Collecting et-xmlfile (from openpyxl<3.1.0,>=3.0.0->awswrangler==2.20.1)
  Downloading et_xmlfile-1.1.0-py3-none-any.whl.metadata (1.8 kB)
Requirement already satisfied: requests<3.0.0,>=2.4.0 in <REDACTED>/.venv/lib/python3.11/site-packages (from opensearch-py<3,>=1->awswrangler==2.20.1) (2.32.3)
Requirement already satisfied: certifi>=2022.12.07 in <REDACTED>/.venv/lib/python3.11/site-packages (from opensearch-py<3,>=1->awswrangler==2.20.1) (2024.7.4)
Requirement already satisfied: Events in <REDACTED>/.venv/lib/python3.11/site-packages (from opensearch-py<3,>=1->awswrangler==2.20.1) (0.5)
Requirement already satisfied: pytz>=2020.1 in <REDACTED>/.venv/lib/python3.11/site-packages (from pandas!=1.5.0,<2.0.0,<=1.5.1,>=1.2.0->awswrangler==2.20.1) (2024.1)
Collecting scramp>=1.4.5 (from pg8000<2.0.0,>=1.20.0->awswrangler==2.20.1)
  Downloading scramp-1.4.5-py3-none-any.whl.metadata (19 kB)
Collecting python-utils>=3.8.1 (from progressbar2<5.0.0,>=4.0.0->awswrangler==2.20.1)
  Downloading python_utils-3.8.2-py2.py3-none-any.whl.metadata (9.7 kB)
Requirement already satisfied: beautifulsoup4<5.0.0,>=4.7.0 in <REDACTED>/.venv/lib/python3.11/site-packages (from redshift-connector<2.1.0,>=2.0.889->awswrangler==2.20.1) (4.12.3)
Requirement already satisfied: lxml>=4.6.5 in <REDACTED>/.venv/lib/python3.11/site-packages (from redshift-connector<2.1.0,>=2.0.889->awswrangler==2.20.1) (5.2.2)
Requirement already satisfied: packaging in <REDACTED>/.venv/lib/python3.11/site-packages (from redshift-connector<2.1.0,>=2.0.889->awswrangler==2.20.1) (23.2)
Requirement already satisfied: setuptools in <REDACTED>/.venv/lib/python3.11/site-packages (from redshift-connector<2.1.0,>=2.0.889->awswrangler==2.20.1) (69.2.0)
Requirement already satisfied: aiosignal>=1.1.2 in <REDACTED>/.venv/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.0->gremlinpython<4.0.0,>=3.5.2->awswrangler==2.20.1) (1.3.1)
Requirement already satisfied: attrs>=17.3.0 in <REDACTED>/.venv/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.0->gremlinpython<4.0.0,>=3.5.2->awswrangler==2.20.1) (23.2.0)
Requirement already satisfied: frozenlist>=1.1.1 in <REDACTED>/.venv/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.0->gremlinpython<4.0.0,>=3.5.2->awswrangler==2.20.1) (1.4.1)
Requirement already satisfied: multidict<7.0,>=4.5 in <REDACTED>/.venv/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.0->gremlinpython<4.0.0,>=3.5.2->awswrangler==2.20.1) (6.0.5)
Requirement already satisfied: yarl<2.0,>=1.0 in <REDACTED>/.venv/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.0->gremlinpython<4.0.0,>=3.5.2->awswrangler==2.20.1) (1.9.4)
Requirement already satisfied: soupsieve>1.2 in <REDACTED>/.venv/lib/python3.11/site-packages (from beautifulsoup4<5.0.0,>=4.7.0->redshift-connector<2.1.0,>=2.0.889->awswrangler==2.20.1) (2.5)
Requirement already satisfied: typing-extensions>3.10.0.2 in <REDACTED>/.venv/lib/python3.11/site-packages (from python-utils>=3.8.1->progressbar2<5.0.0,>=4.0.0->awswrangler==2.20.1) (4.12.2)
Requirement already satisfied: charset-normalizer<4,>=2 in <REDACTED>/.venv/lib/python3.11/site-packages (from requests<3.0.0,>=2.4.0->opensearch-py<3,>=1->awswrangler==2.20.1) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in <REDACTED>/.venv/lib/python3.11/site-packages (from requests<3.0.0,>=2.4.0->opensearch-py<3,>=1->awswrangler==2.20.1) (3.7)
Collecting asn1crypto>=1.5.1 (from scramp>=1.4.5->pg8000<2.0.0,>=1.20.0->awswrangler==2.20.1)
  Downloading asn1crypto-1.5.1-py2.py3-none-any.whl.metadata (13 kB)
Downloading awswrangler-2.20.1-py3-none-any.whl (272 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 272.3/272.3 kB 10.9 MB/s eta 0:00:00
Downloading backoff-2.2.1-py3-none-any.whl (15 kB)
Downloading gremlinpython-3.7.2-py2.py3-none-any.whl (78 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.1/78.1 kB 10.3 MB/s eta 0:00:00
Downloading jsonpath_ng-1.6.1-py3-none-any.whl (29 kB)
Downloading openpyxl-3.0.10-py2.py3-none-any.whl (242 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 242.1/242.1 kB 18.4 MB/s eta 0:00:00
Downloading pandas-1.5.1-cp311-cp311-macosx_11_0_arm64.whl (10.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.7/10.7 MB 16.9 MB/s eta 0:00:00
Downloading pg8000-1.31.2-py3-none-any.whl (54 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54.5/54.5 kB 5.3 MB/s eta 0:00:00
Downloading progressbar2-4.4.2-py3-none-any.whl (56 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.8/56.8 kB 6.7 MB/s eta 0:00:00
Downloading pyarrow-10.0.1-cp311-cp311-macosx_11_0_arm64.whl (22.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 22.9/22.9 MB 15.1 MB/s eta 0:00:00
Downloading PyMySQL-1.1.1-py3-none-any.whl (44 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.0/45.0 kB 4.9 MB/s eta 0:00:00
Downloading redshift_connector-2.0.918-py3-none-any.whl (124 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 kB 11.0 MB/s eta 0:00:00
Downloading requests_aws4auth-1.3.1-py3-none-any.whl (24 kB)
Downloading aenum-3.1.15-py3-none-any.whl (137 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 137.6/137.6 kB 9.8 MB/s eta 0:00:00
Downloading python_utils-3.8.2-py2.py3-none-any.whl (27 kB)
Downloading scramp-1.4.5-py3-none-any.whl (12 kB)
Downloading et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Downloading asn1crypto-1.5.1-py2.py3-none-any.whl (105 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 105.0/105.0 kB 14.5 MB/s eta 0:00:00
Installing collected packages: asn1crypto, aenum, scramp, python-utils, pymysql, pyarrow, jsonpath-ng, et-xmlfile, backoff, requests-aws4auth, progressbar2, pg8000, pandas, openpyxl, gremlinpython, redshift-connector, awswrangler
  Attempting uninstall: pyarrow
    Found existing installation: pyarrow 17.0.0
    Uninstalling pyarrow-17.0.0:
      Successfully uninstalled pyarrow-17.0.0
  Attempting uninstall: pandas
    Found existing installation: pandas 2.2.2
    Uninstalling pandas-2.2.2:
      Successfully uninstalled pandas-2.2.2
  Attempting uninstall: awswrangler
    Found existing installation: awswrangler 3.7.3
    Uninstalling awswrangler-3.7.3:
      Successfully uninstalled awswrangler-3.7.3
Successfully installed aenum-3.1.15 asn1crypto-1.5.1 awswrangler-2.20.1 backoff-2.2.1 et-xmlfile-1.1.0 gremlinpython-3.7.2 jsonpath-ng-1.6.1 openpyxl-3.0.10 pandas-1.5.1 pg8000-1.31.2 progressbar2-4.4.2 pyarrow-10.0.1 pymysql-1.1.1 python-utils-3.8.2 redshift-connector-2.0.918 requests-aws4auth-1.3.1 scramp-1.4.5

Importing the module after this works.

Here's the output after first running pip uninstall jsonpath-ng and then running pip install --upgrade awswrangler

Requirement already satisfied: awswrangler in <REDACTED>/.venv/lib/python3.11/site-packages (2.20.1)
Collecting awswrangler
  Using cached awswrangler-3.9.1-py3-none-any.whl.metadata (17 kB)
Requirement already satisfied: boto3<2.0.0,>=1.20.32 in <REDACTED>/.venv/lib/python3.11/site-packages (from awswrangler) (1.34.25)
Requirement already satisfied: botocore<2.0.0,>=1.23.32 in <REDACTED>/.venv/lib/python3.11/site-packages (from awswrangler) (1.34.25)
Requirement already satisfied: numpy<2.0,>=1.18 in <REDACTED>/.venv/lib/python3.11/site-packages (from awswrangler) (1.26.4)
Requirement already satisfied: packaging<25.0,>=21.1 in <REDACTED>/.venv/lib/python3.11/site-packages (from awswrangler) (23.2)
Requirement already satisfied: pandas<3.0.0,>=1.2.0 in <REDACTED>/.venv/lib/python3.11/site-packages (from awswrangler) (1.5.1)
Requirement already satisfied: pyarrow>=8.0.0 in <REDACTED>/.venv/lib/python3.11/site-packages (from awswrangler) (10.0.1)
Requirement already satisfied: typing-extensions<5.0.0,>=4.4.0 in <REDACTED>/.venv/lib/python3.11/site-packages (from awswrangler) (4.12.2)
Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in <REDACTED>/.venv/lib/python3.11/site-packages (from boto3<2.0.0,>=1.20.32->awswrangler) (1.0.1)
Requirement already satisfied: s3transfer<0.11.0,>=0.10.0 in <REDACTED>/.venv/lib/python3.11/site-packages (from boto3<2.0.0,>=1.20.32->awswrangler) (0.10.2)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in <REDACTED>/.venv/lib/python3.11/site-packages (from botocore<2.0.0,>=1.23.32->awswrangler) (2.9.0.post0)
Requirement already satisfied: urllib3<2.1,>=1.25.4 in <REDACTED>/.venv/lib/python3.11/site-packages (from botocore<2.0.0,>=1.23.32->awswrangler) (1.26.19)
Requirement already satisfied: pytz>=2020.1 in <REDACTED>/.venv/lib/python3.11/site-packages (from pandas<3.0.0,>=1.2.0->awswrangler) (2024.1)
Requirement already satisfied: six>=1.5 in <REDACTED>/.venv/lib/python3.11/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<2.0.0,>=1.23.32->awswrangler) (1.16.0)
Using cached awswrangler-3.9.1-py3-none-any.whl (381 kB)
Installing collected packages: awswrangler
  Attempting uninstall: awswrangler
    Found existing installation: awswrangler 2.20.1
    Uninstalling awswrangler-2.20.1:
      Successfully uninstalled awswrangler-2.20.1
Successfully installed awswrangler-3.9.1

After this the import fails.

Expected behavior

after running pip install awswrangler, import awswrangler should work.

Your project

No response

Screenshots

No response

OS

M1 Pro Mac, Mac OS Sonoma 14.6.1

Python version

3.11.9

AWS SDK for pandas version

3.9.1 (and all 3.x releases)

Additional context

No response

LeonLuttenberger commented 2 months ago

Hey,

When you install awswrangler, make sure to include the opensearch extra:

pip install --upgrade awswrangler[opensearch]

When you use that extra parameter, pip will also install opensearch-py, jsonpath-ng, and requests-aws4auth, which are the three additional libraries we need to support OpenSearch.

From the logs you included, it looks like your original environment included opensearch-py but not jsonpath-ng. Then, when you installed the 2.x.x version of awswrangler, it came included with all the OpenSearch dependencies. The reason for this is that starting with 3.0.0, we decided to keep the list of dependencies we install by default to a bare minimum. So OpenSearch went from being a regular dependency to an optional dependency. More information on why we made this decision is available here.

Let me know if adding the extra parameter to the pip install works for you.

Best regards, Leon

LeonLuttenberger commented 2 months ago

I also created #2939 so that a more sensible error is thrown in the future.

RLashofRegas commented 2 months ago

Hmmm, this still seems like an issue to me. What if I am using opensearch-py for some things, but not the awswrangler opensearch extensions. The following fails:

virtualenv --python="/Library/Frameworks/Python.framework/Versions/3.11/bin/python3" .venv
source .venv/bin/activate
pip install --upgrade pip
pip install --upgrade  opensearch-py
pip install --upgrade awswrangler
python
>>> import awswrangler
LeonLuttenberger commented 2 months ago

You still need the jsonpath-ng and requests-aws4auth dependencies. When you run pip install awswrangler[opensearch], all of this is installed automatically.

Alternatively, you can install them manually:

pip install --upgrade opensearch-py jsonpath-ng requests-aws4auth
pip install --upgrade awswrangler
RLashofRegas commented 2 months ago

I don't need those. I am using opensearch-py directly in other parts of my project and then using awswrangler.s3 methods NOT the opensearch methods. import opensearchpy works fine with the above setup. So what I'm saying is that just because I installed opensearch-py in my environment for unrelated code, shouldn't break awswrangler.

LeonLuttenberger commented 2 months ago

Gotcha, sorry about the misunderstanding.

The PR that I created (#2939) should fix this once it’s merged and released. In the meantime, the easiest way to work around this is to install jsonpath-ng.