Closed honeyankit closed 1 year ago
When choosing the storage, we should keep in mind that the price will rely on two factors:
S3 Pricing: https://aws.amazon.com/s3/pricing/ Dynamo DB pricing: https://aws.amazon.com/dynamodb/pricing/on-demand/
@miquelduranfrigola : What is Ersilia's budgetary allotment for AWS storage? This will have a significant influence on whether S3 or Dynamo DB is chosen.
Hey @honeyankit,
clearly, S3 is much cheaper than DynamoDB, so if we have the money, what are the advantages of paying DynamoDB? From the little I know about AWS, better query functionalities?
Hello, @honeyankit and @GemmaTuron . I think asides from the pricing the type of data being stored is very important too. Based on your description the data is in key-value pairs. DynamoDB and S3 both have great features. Price: S3 is cheaper than DynamoDB(which of course the budget allotment would be a great determining factor) Management: In the long run, managing bucket policies on S3 can become cumbersome as the application scales while the DynamoDB is a fully managed serverless service. Data: DynamoDB supports key-value and document data models while S3 can store for virtually any use case. Functionality: S3 supports parallel requests while the DynamoDB is a NoSQL database that has high concurrency for read/write requests, and unlimited throughput.
Note: DynamoDB and S3 can be integrated too.
I would recommend DynamoDB though if the budget allocation for AWS storage meets the pricing.
Hey @honeyankit,
clearly, S3 is much cheaper than DynamoDB, so if we have the money, what are the advantages of paying DynamoDB? From the little I know about AWS, better query functionalities?
@GemmaTuron dynamodb is suitable for storing tabular data or textual data , as in the for of key value pairs, where key is the field and the value is its value.
as in:
{
'model' : 'x',
'accuracy': 90
}
Whereas S3 is more suitable for storing the files, as in pdf or CSVs, let say we want to store the predicted output.csv file to be stored in a database for future reference so we will store the CSV file in s3 and we might store the link to the s3 bucket and the metadata about the file in dynamodb.
I would like to suggest that the choice of the tool also depend upon the functionalities:
Thanks all for this interesting discussion.
We had a meeting with one of our best contributors, who is quite familiar with AWS.
Based on budget and on the usage needs, we decided to go for DynamoDb. We will set this up promptly. We hope it won't be a blocker.
@honeyankit - is this blocking us at this stage? If so, please let us know and will try to have a working solution ASAP.
Thank you for all your feedback.
@miquelduranfrigola : DynamoDB would be the preferred choice as it will store molecule and its prediction as a key value pair and will be simple to implement the logic of retrieving and checking the key/value in a single call.
I know we have already selected the DynamoDB
. But still I am putting my Initial thoughts on implementing this feature with both the DB.
DynamoDB Implementation
S3 implementation
is this blocking us at this stage?
This is not blocking at the moment, just wanted to come on the conclusion for selecting the storage.
Based on budget and on the usage needs, we decided to go for DynamoDb
Based on the @miquelduranfrigola comment, we are going with DynamoDB.
Feature Ersilia wants to implement a new feature where the output (key value pairs) are produced by the models based on the molecule (input) and need to be stored in some storage so that when the new model is submitted to Ersilia, it should first check to see if the prediction values exist for that input (molecule) in the storage and pull those values, or else the model should compute on the input values and generate and store the prediction values back to the storage.
Exit Criteria: To select the appropriate storage (S3 or Dynamo DB) which will suit the Ersilia feature.