aws / aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
https://aws-sdk-pandas.readthedocs.io
Apache License 2.0
3.91k stars 699 forks source link

Parse error running Neptune SPARQL query with null value #2690

Open mhavey opened 7 months ago

mhavey commented 7 months ago

Describe the bug

When I run a SPARQL SELECT query against Neptune using AWS SDK for Pandas, if one of the values is null, I get an error:

File ~/anaconda3/envs/JupyterSystemEnv/lib/python3.10/site-packages/awswrangler/neptune/_neptune.py:116, in execute_sparql..(d) 114 if "results" in data and "bindings" in data["results"]: 115 df = pd.DataFrame(data["results"]["bindings"], columns=data.get("head", {}).get("vars")) --> 116 df = df.applymap(lambda d: d["value"] if "value" in d else None) 117 else: 118 df = pd.DataFrame(data)

TypeError: argument of type 'float' is not iterable

How to Reproduce

I wrap SDK call in a function run_sparql_introspect as follows

import awswrangler as wr
import pandas as pd
import igraph as ig
import graph_notebook as gn
from graph_notebook.configuration.generate_config import AuthModeEnum

# Get the configuration information for the notebook
config = gn.configuration.get_config.get_config()
iam=True if config.auth_mode==AuthModeEnum.IAM else False

# Retrieve Data from neptune
client = wr.neptune.connect(config.host, config.port, iam_enabled=iam)

def run_sparql_introspect(query):
    df = wr.neptune.execute_sparql(client, query)
    #display(df.head(10))
    return df

I run a SPARQL query as follows.

run_sparql_introspect("""SELECT distinct ?predicate ?objType
WHERE {
    ?resource rdf:type ?class .
    ?resource ?predicate ?object .
    FILTER(isIRI(?object)) .
    OPTIONAL { ?object a ?objType }  .
}
LIMIT 200""")

In the result, sometimes objType is null. That's ok, it is optional. But SDK execute_sparql throws error.

Expected behavior

No response

Your project

No response

Screenshots

No response

OS

Amazon Linux 2

Python version

3.10.8

AWS SDK for pandas version

3.6.0

Additional context

No response

kukushking commented 5 months ago

Thanks @mhavey for opening this, looking into reproducing the issue.