Open syncpark opened 9 months ago
@syncpark I'd like you to collect and organize the issues related to Giganto's performance, so I think the first step is to think about the strategy for improving Giganto's performance in the big picture.
@msk, @sophie-cluml, let's discuss this together.
This approach to parallelize the storage of Conn events and other protocols could result in a latency decrease of about 14% (100% - 86%), which is not negligible. However, I have concerns that this alone might not sufficiently address Giganto's scalability issues under heavy traffic.
To tackle the core of the problem, we first need to identify where the bottleneck lies. @syncpark's suggestion hints at the physical disk I/O being the constraint. If that's the case, a potential solution could be to increase the number of stripes in our RAID configuration. This might offer a simpler and possibly more effective way to enhance performance compared to separating Conn and other events.
On the other hand, if the bottleneck is at the level of RocksDB operations, like locking or transaction handling, splitting the events across multiple RocksDB instances on different disks could be beneficial. However, dividing them based on event type may not be the most efficient, particularly when a single type (e.g., Conn) dominates. A more balanced approach could be to distribute events evenly, perhaps using hash values.
Additionally, it’s crucial to consider how much CPU time is currently idle. If we have sufficient CPU resources available, we might explore more aggressive methods. These could include batching events for storage (e.g., storing 1,000 events in a single RocksDB column family entry), compressing events before storage, or implementing both strategies.
Issue
In TIS project, the input traffic is 10Gbps or 20Gbps for each collector machine. Since Giganto cannot not process all events sent by a single Piglet, the Giganto's storage/retrieval performance needs to be improved.
Purpose
Let's improve storage/retrieval performance by storing Conn events in a different HDD RAID than other protocols.
Background
Event ratio by protocols:
If Conn events can be stored separately in a separate HDD RAID, Disk I/O competition with storage and search requests from other protocols is reduced. As a result, We can expect improved performance.
TODOs