GreptimeTeam / greptimedb

An open-source, cloud-native, unified time series database for metrics, logs and events with SQL/PromQL supported. Available on GreptimeCloud.
https://greptime.com/
Apache License 2.0
4.19k stars 299 forks source link

Last point reports empty result set #4650

Closed v0y4g3r closed 1 week ago

v0y4g3r commented 2 weeks ago

What type of bug is this?

Incorrect result

What subsystems are affected?

Standalone mode, Storage Engine

Minimal reproduce step

  1. Write test data using TSBS cpu metric data set.
  2. Flush table using ADMIN flush_table('cpu')
  3. Query last point of every time series using:
    SELECT   
    last_value(hostname order by ts),   
    last_value(region order by ts),   
    last_value(datacenter order by ts),   
    last_value(rack order by ts),   
    last_value(os order by ts),   
    last_value(arch order by ts),   
    last_value(team order by ts),   
    last_value(service order by ts),   
    last_value(service_version order by ts),   
    last_value(service_environment order by ts),   
    last_value(usage_user order by ts),   
    last_value(usage_system order by ts),   
    last_value(usage_idle order by ts),   
    last_value(usage_nice order by ts),   
    last_value(usage_iowait order by ts),   
    last_value(usage_irq order by ts),   
    last_value(usage_softirq order by ts),   
    last_value(usage_steal order by ts),   
    last_value(usage_guest order by ts),   
    last_value(usage_guest_nice order by ts) 
    FROM cpu group by hostname;

    This step is expected to get 4000 rows of results in that the data set has 4000 time series in total.

  4. Repeat the query again. Now you will get empty set, which is inccorect image

What did you expect to see?

The query above should always give a result of 4000 rows.

What did you see instead?

An empty set.

What operating system did you use?

NA

What version of GreptimeDB did you use?

0.9.2

Relevant log output and stack trace

No response

v0y4g3r commented 2 weeks ago

This bug may relate to the different behavior between readers.

In the outer, the reader may be polled until it yields a None

https://github.com/GreptimeTeam/greptimedb/blob/216bce69736ab3313892df1bd3d2c0af74a2291a/src/mito2/src/read/seq_scan.rs#L137

But in the inner RowGroupLastRowReader, it only yields a batch when reaches a new time series. So the last time the reader is polled it will update the cache with an empty yield_batches. https://github.com/GreptimeTeam/greptimedb/blob/0b0ed03ee63c7e5baaeda8b5cc93f22cbdc86124/src/mito2/src/read/last_row.rs#L190

We can quick fix this by check the length of RowGroupLastRowReader::yielded_batches, because we always expect that one row group should at least contain a batch. If yielded_batches is empty, then this poll must be an invalid poll and we should not cache the result.