aws / aws-sdk-js

AWS SDK for JavaScript in the browser and Node.js
https://aws.amazon.com/developer/language/javascript/
Apache License 2.0
7.59k stars 1.55k forks source link

How to search directly on S3 #2543

Closed vladejs closed 5 years ago

vladejs commented 5 years ago

I just searched the entire Google and didn't find a solution for this (apparently simple) problem.

I need to return a list of filtered given a random text. Example:

Given the text: was. Return all this files:

bucketName/itwasfine.txt bucketName/alreadyinthere.was.txt And exclude everything else.

My goal is to create a client side table with a fuzzy search functionality on the bucket's keys, including incremental pagination.

How to achieve that without getting all objects on the bucket?

srchase commented 5 years ago

@vladejs

There is not a direct way to do this.

listObjectsV2 allows you to retrieve objects by key prefix, but would not be able to filter for 'was' in the key 'foowasbar'.

The AWS CLI gives you a the following operation:

aws s3api list-objects --bucket bucketName --query "Contents[?contains(Key, 'was')]"

That makes one or more API calls to retrieve all Objects in the bucket and filters the results locally.

vladejs commented 5 years ago

OMG, so what about having one million items in my bucket?

Given your answer, is literally impossible to do a performant search on s3.

I'm then forced to download 1 million items locally and do the search. Is that the approach I should take?

On Mon, Feb 18, 2019, 1:41 PM Chase Coalwell notifications@github.com wrote:

@vladejs https://github.com/vladejs

There is not a direct way to do this.

listObjectsV2 https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#listObjectsV2-property allows you to retrieve objects by key prefix, but would not be able to filter for 'was' in the key 'foowasbar'.

The AWS CLI gives you a the following operation:

aws s3api list-objects --bucket bucketName --query "Contents[?contains(Key, 'was')]"

That makes one or more API calls to retrieve all Objects in the bucket and filters the results locally.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aws/aws-sdk-js/issues/2543#issuecomment-464839656, or mute the thread https://github.com/notifications/unsubscribe-auth/AMDUAlAd2HtQ-6PyJWhngb1B7BbkINd4ks5vOvPqgaJpZM4bBWDD .

srchase commented 5 years ago

@vladejs

Refer to this Whitepaper: AWS Storage Services Overview

Amazon S3 doesn’t offer query capabilities to retrieve specific objects. When you use Amazon S3 you need to know the exact bucket name and key for the files you want to retrieve from the service. Amazon S3 can’t be used as a data base or search engine by itself. Instead, you can pair Amazon S3 with Amazon DynamoDB, Amazon CloudSearch, or Amazon Relational Data base Service (Amazon RDS) to index and query metadata about Amazon S3 buckets and objects

The recommended solution is to pair S3 with an additional service. With this approach, your index can return only the relevant keys.

ffxsam commented 5 years ago

@vladejs S3 should be for storing data, and if you want to search and find specific objects, you should have some sort of database to keep track.

no-response[bot] commented 5 years ago

This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further.

lock[bot] commented 4 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.