Closed blackrez closed 1 month ago
Curious if the Opensearch serverless version that was announced a re:invent would be more cost effective if that is the blocking point. what maintenance activities make it difficult to maintain?
Sadly, the serverless version is not cost effective. The maintenance activities makes opensearch hard to maintain : the reindex, the mapping change, upgrading cluster... I know AWS had done a great jobs to make opensearch better but for little organisation, it is hard to operate.
Hi @blackrez, I agree that OpenSearch indexes and mappings are hard to update in OpenSearch. Also, OpenSearch is the service that takes the bigger portion of data.all costs
You are more than welcome to work on implementing the functionality :) Fork the code and open a PR when the code is ready. Keep in mind that to make it mergeable, you will need to make the "underlying catalog database" configurable, something like adding a parameter in the cdk.json that defines whether we use RDS or Opensearch.
Feel free to add here your questions, comments or designs and we can discuss them
Hi @blackrez we are exploring the possibility to implement OpenSearch serverless into data.all (@kukushking). For the case of data.all, where we hardly ever update mappings or have to re-index, do you see any limitations into using OpenSearch serverless as a cost-effectiveness solution? Could you give more detail about other alternatives? Your opinion is highly appreciated :)
Hello @dlpzx,
From my point of view, OpenSearch Serverless is not at all cost-effective.
You can configure a maximum number of OCUs for your account in order to control costs. You're billed for a minimum of 4 OCUs allocated for your workloads when you create a collection.
It will cost 0,24$4OCU730h = 700$
My main alternative is the full text search from PostGreSQL. The creation of index will be simpler (https://www.postgresql.org/docs/current/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX). I don't see any technical limitations to achieve this. Most text functions are present in and it works out of the box.
I found a lot of code about the index mapping and creation but not a lot about the search itself.
@InasK6, I believe your customer conducted a cost analysis on this component of data.all. Would it be possible to share the results?
hello @dlpzx @blackrez
Indeed, the main findings from my cost analysis are that
I have considered two options to reduce these cost:
Advantages
Challenges
Advantages
enable_opensearch_serverless
flag in cdk.json
file to false
+ change two lines of code to specify the instance type of your master nodes and worker nodes
Note You cannot chose M3, R3, T2 types. You will get an error saying they do not support encryption of data at rest for your Managed OpenSearch clusterChallenges
Tradeoffs need be considered regarding this choice. If you want to go for the RDS implementation, That would need more developer effort. I have already started implementing db functions and GraphQL objects to test on one of the filter options of the Catalog. So, I could help starting on that. If you consider the Managed OpenSearch option, extra operational or testing overhead can be needed to estimate the right size for your need, but it only needs two lines of code changes
We will close this issue as the moment as we are not planning to implement it. If interest grows we can reopen it and work on it.
Is your feature request related to a problem? Please describe. Use and maintain Opensearch cluster is hard and costly to maintain (the AWS serverless version cost 700.80$/month).
Describe the solution you'd like Use Full text search in postgresql
https://www.postgresql.org/docs/current/textsearch.html For example : https://amitosh.medium.com/full-text-search-fts-with-postgresql-and-sqlalchemy-edc436330a0c https://www.crunchydata.com/blog/postgres-full-text-search-a-search-engine-in-a-database
Additional context I can help to add this functionality, I'm very interested by the product but Opensearch/ES is clearly a blocking point.
P.S. Please Don't attach files. Add code snippets directly in the message body instead.