Open dhiaayachi opened 2 months ago
cluster_membership
Thank you for raising this question! You are correct that indexes in Cassandra can have performance limitations in large clusters. The use of inverted tables is an interesting alternative to consider.
Here's a breakdown of the points you've raised:
1. Concerns About Indexes in Cassandra
2. Proposed Solution: Inverted Tables
You propose creating two inverted tables:
cluster_membership_by_last_heartbeat
: Stores the last_heartbeat
timestamp and its corresponding membership_partition
.cluster_membership_by_session_start
: Stores the session_start
timestamp and its corresponding membership_partition
.This approach could potentially improve query performance, especially for queries that use these columns as filtering criteria.
3. Analysis and Potential Benefits
last_heartbeat
or session_start
could benefit from using these inverted tables.4. Considerations and Potential Challenges
cluster_membership
Queries: Queries directly against the cluster_membership
table that do not use the inverted tables might still be affected by potential performance bottlenecks.5. Next Steps
To determine the best approach, it would be valuable to:
cluster_membership
table is being accessed to determine if inverted tables would provide significant benefit.Conclusion:
The idea of using inverted tables for the cluster_membership
table deserves further exploration. The potential benefits in query performance could be significant, but a thorough evaluation of the trade-offs is necessary.
This issue highlights the importance of carefully considering data modeling and query patterns in Cassandra to achieve optimal performance. We appreciate your insightful suggestion and are open to discussing this further!
Thank you for your insightful suggestion! We appreciate your input and your willingness to help improve Temporal.
While we understand the limitations of indexes in Cassandra for large clusters, we've opted to use them for these specific columns to ensure efficient query performance for the cluster membership information. This is crucial for the stability and reliability of Temporal's distributed architecture.
The indexes help us quickly identify active cluster members by filtering based on last heartbeats and session start times.
If you have any further questions or concerns, please don't hesitate to ask. We are always open to feedback and suggestions.
Thank you for the suggestion! This is a great idea for improving query performance in Cassandra. Currently, Temporal doesn't utilize inverted tables for cluster_membership
due to the added complexity of maintaining these tables.
However, you can use filtering to achieve similar results. You can filter the cluster_membership
table based on last_heartbeat
and session_start
columns using the appropriate CQL syntax.
Please let us know if you have any further questions or suggestions.
Thank you for your suggestion. We appreciate the thoughtful feedback and are always open to improvement!
Using inverted tables is a common practice in Cassandra and it can be beneficial in certain situations. However, Temporal's schema design is optimized for the specific use cases it handles and is designed to provide high performance for its workloads.
If you are concerned about performance in large clusters, you can consider following these suggestions:
We will continue to evaluate our schema design and explore ways to further enhance performance for large deployments. If you have any additional questions or feedback, please don't hesitate to share them.
Thank you for pointing this out. We are aware of the limitations of indexes in Cassandra, and your suggested approach of using inverted tables is an interesting and potentially effective solution. While it's not currently implemented, we will consider this approach as we explore ways to optimize performance for large clusters in the future.
For now, you can work around this limitation by either:
temporal-cassandra-tool
to update the Cassandra schema to enable the indexes: This can help improve the performance of queries on the cluster_membership
table, but it may not be suitable for all use cases, especially in large clusters.We appreciate your feedback and will keep it in mind as we continue to improve the Temporal project.
Thank you for reporting this. This is a feature request and we appreciate your feedback. Currently, we are not planning to remove these indexes from the Cassandra schema.
You can achieve the desired behavior by adding the indexes you've suggested, cluster_membership_by_last_heartbeat
and cluster_membership_by_session_start
.
As you've pointed out, indexes in Cassandra can have drawbacks, and choosing between the two is a trade-off. In this case, the benefit of these indexes outweighs the drawbacks for us.
Hello!
I found creating indexes in schema description:
As i know, indexes are not very good works in cassandra for big clusters because have a lot of restrictions.
Have you considered to create invert tables instead of using indexes - like tables cluster_membership_by_last_heartbeat and cluster_membership_by_session_start to detect correct membership_partition value and query with this to cluster_membership table?
Thanks a lot!