# Considerations
- Concepts
- Cluster, node, index, shard, segment, document
- Masters, data, coordinators (client), ingestion, ML
- Data Operations: Index, Delete, Update, Search
- Cost
- Service
- AWS
- Free tier
- Charged by instance size (EC2), volume size (EBS) and data transfer (standard)
- Reserved Instances are supported
- Snapshots are free
- Self-hosted
- Free tier
- Charged by instance size (EC2), volume size (EBS) and data transfer (standard)
- Reserved Instances and Savings Plans are supported
- Snapshots are charged by storage size
- Operation
- AWS
- Scaling can be achieved simply through AWS API/CLI
- Tuning is more limited due to constrained configuration settings
- Failed nodes are replaced automatically
- Self-hosted
- Requires more expertise for monitoring/optimization/tuning
- Requires more automation for scaling
- Configuration
- Cache settings
- Thread settings
- Security
- XPack: not available in AWS
- Operation
- Initial Setup
- Self-healing
- Updating/Patching
- Scaling
- Backups
- Index Management (curator, ILM)
- Security
- AuthN
- AWS: IAM
- Self-hosted: XPack
- AuthZ
- AWS: IAM + FGAC
- Self-hosted: XPack
- Encryption
- At-rest
- In-transit
- Node-to-node
- Monitoring
- AWS
- CloudWatch: metrics (limited), dashboards, alarms, (limited logs)
- Self-hosted
- Node-Exporter => Prometheus/Grafana
- ElasticSearch-Exporter => Prometheus/Grafana
- Kibana XPack Monitoring
- SemaText ES monitoring (paid)
- Both
- Kibana Xpack Monitoring: free plan is very basic
- Usage
- Logs Ingestion
- Quantity: number of log events per hour
- Size: average number of bytes per hour
- Storage
- Use Quantity and Size to estimate the storage size that will be needed
- With that we can also estimate backups storage size
- Querying
- How many simultaneous users will be accessing Kibana?
- How many queries will each user run per hour?
- Capacity Planning
- Storage (Volume)
- Ingestion
- Amount of data per unit of time
- Eg: 500 KB/sec > 1265,6 GB/mo
- Retention: how many days?
- Eg: 30d
- Compute (Throughput)
- Search-bound vs Index-bound
- For Centralized Logging we want a nice index rate; Search will be secondary
- References
- https://www.elastic.co/pdf/elasticsearch-sizing-and-capacity-planning.pdf
- https://aws.amazon.com/blogs/big-data/best-practices-for-configuring-your-amazon-elasticsearch-service-domain/
Why?
To simplify Leverage adopter's the decision-making about ElasticSearch Managed vs Self-Hosted cloud solution implementation
What?
Why?