aws / aws-sdk-js

AWS SDK for JavaScript in the browser and Node.js
https://aws.amazon.com/developer/language/javascript/
Apache License 2.0
7.59k stars 1.55k forks source link

Query JSON content from S3 not working with where clause #4128

Closed IIslam closed 1 week ago

IIslam commented 2 years ago

Describe the bug

I'm trying to query some json content from s3 bucket it works fine when I select * from the bucket but once I specify LIMIT id do nothing returns whole data and when specify WHERE it returns empty response.

Expected Behavior

When the query is like

SELECT * from s3object data where data.id = 1 I should get

{ "id": 1, "name": "abcd", "dataIds": [ 2, 1, 2 ], "attributes": [ "name", "picUrl", "email", "linkedinUrl", "job" ] }

or even when specify limit like this

select * from s3object limit 1 I should get one object

{ "id": 1, "name": "abcd", "dataIds": [ 2, 1, 2 ], "attributes": [ "name", "picUrl", "email", "linkedinUrl", "job" ] }

Current Behavior

returns empty file.

Reproduction Steps

aws s3api select-object-content --bucket e-commerce-uploads --key Content.json --expression "select * from s3object data where data.id = 1" --expression-type 'SQL' --input-serialization '{"JSON": {"Type":"DOCUMENT"}, "CompressionType": "NONE"}' --output-serialization '{"JSON": {}}' "json_output.json"

aws s3api select-object-content --bucket e-commerce-uploads --key Content.json --expression "select * from s3object data LIMIT 1" --expression-type 'SQL' --input-serialization '{"JSON": {"Type":"DOCUMENT"}, "CompressionType": "NONE"}' --output-serialization '{"JSON": {}}' "json_output.json"

DB.js

import { S3 } from '@aws-sdk/client-s3';

class S3DB {
    constructor(s3Region) {
        this.region = s3Region;
        this.client = new S3({ region: s3Region });
    }
    query(queryString, bucketName, fileName) {
        console.log('query string', queryString);
        return new Promise((resolve, reject) => {
            this.client.selectObjectContent({
                Bucket: bucketName,
                Key: fileName,
                ExpressionType: 'SQL',
                Expression: queryString,
                InputSerialization: {
                    JSON: {
                        Type: "DOCUMENT"
                    },
                    CompressionType: 'NONE'
                },
                OutputSerialization: {
                    JSON: {
                        RecordDelimiter: ' '
                    }
                }
            }).then(async ({ $metadata, Payload }) => {
                console.log("meta data===>", $metadata);
                console.log("Payload", Payload);
                let array = [];
                for await (const data of Payload) {
                    console.log(data);
                    if (data.Records)
                        array.push(Buffer.from(data.Records.Payload).toString('utf8'));
                }
                console.log('result array', array);
            });

        });

    }
}
export default S3DB;

handler

const bucketName = "e-commerce-uploads";
const fileName = "Content.json";
const region = 'eu-west-1';

  const db = new DB(region);
  db.query(`SELECT * FROM S3Object[*] AS data WHERE id = 1`, bucketName, fileName)
    .then(data => {
      console.log('query data', data);
    }).catch(err => {
      console.log('query error', err);
    });

Possible Solution

No response

Additional Information/Context

No response

SDK version used

"@aws-sdk/client-s3": "^3.110.0" aws-cli/2.7.3 Python/3.9.11 Windows/10 exe/AMD64 prompt/off

Environment details (OS name and version, etc.)

Windows 10

ajredniwja commented 2 years ago

@IIslam thanks for opening this issue, I see you are using V3 of the SDK, can you share what error you get when you run selectobjectcontent operation.

IIslam commented 2 years ago

Hi @ajredniwja I didn't get any errors but I get no data when I add the where clause

IIslam commented 2 years ago

Hi @ajredniwja I'm waiting for your reply.

aBurmeseDev commented 3 weeks ago

Hi @IIslam - apologies for the delayed response. I understand that the issue you're facing is related to the way the SQL expression is constructed for the select-object-content operation in Amazon S3. The SQL syntax used in this operation differs slightly from standard SQL.

If you're still encountering this problem, I'd be more than happy to provide further assistance and guidance. Please feel free to share any additional information or provide the specific SQL expression you're using, and I'll do my best to help you construct the correct syntax to achieve the desired behavior.

Best, John

github-actions[bot] commented 1 week ago

This issue has not received a response in 1 week. If you still think there is a problem, please leave a comment to avoid the issue from automatically closing.