CloudWise-OpenSource / GAIA-DataSet

GAIA, with the full name Generic AIOps Atlas, is an overall dataset for analyzing operation problems such as anomaly detection, log analysis, fault localization, etc.
GNU General Public License v2.0
176 stars 31 forks source link

Duplicate timestamps in metrics #9

Open mistycheney opened 1 year ago

mistycheney commented 1 year ago

There are duplicate timestamps in many metrics. Some of these duplicates have the same value, but often the same timestamp appears in multiple rows with different values. Usually in such cases, one of these rows has a valid value and the remaining rows are 0. Can I just take the non-zero row as the correct row to use for this timestamp? Is this expected when you collected and compiled the data? Thanks.

Xander-cloudwise commented 1 year ago

Thank you for your concern to GAIA-Dataset. First, in general, this situation is normal because there may be some uncertainty in the data collection process, resulting in multiple records being recorded under the same timestamp. Second, some metrics in GAIA dataset (mainly those starting with "system" in the filename) were recorded without tags, resulting in data from different time series being recorded together. In this case, it is necessary to perform aggregation operations on the metric data based on their specific situation. For example, for the "system_network_out_dropped" metric, it can be aggregated using the sum function.