Open perederyaev opened 3 weeks ago
Hello,
Yes, VictoriaMetrics will drop "null" value from /api/v1/import
when store
https://github.com/VictoriaMetrics/VictoriaMetrics/blob/5e8c087d4244a4d82e11c1428e9699d2a00b6cb7/lib/storage/storage.go#L1811-L1816
But as I can see, this behavior hasn't changed since /api/v1/import
added support to ingest value like null
in v1.82.0.
https://github.com/VictoriaMetrics/VictoriaMetrics/blob/5e8c087d4244a4d82e11c1428e9699d2a00b6cb7/lib/protoparser/vmimport/parser.go#L142-L143
So if you import timeseries with all "null" values like your example above, the whole timeseries will be dropped and they won't be counted in vm_slow_row_inserts_total
since they're not inserted.
But if you import timeseries with part of "null" values like "values":[3,null,13]
, the timeseries will be registered and not be marked as slow_insert next time.
Same metrics are processed correctly in v1.87.14 - they are counted in active series, we can access them and no strange 'slow inserts', 'cache misses' and so on.
Did you ingest the same values to v1.87.14, set all the values to "null"?
Hi Haleygo, We are using vmagent for sending metrics to VM. In this case /api/v1/write is used but not /api/v1/import.
Did you ingest the same values to v1.87.14, set all the values to "null"?
We have vmagent sending the same metrics to VM v1.87.14 and v1.93.14 with absolutely same settings. Only v1.93.14 has the issue with registering new timeserries if they have nulls value and hadn't existed before.
Ok, that's not expected. What the version of vmagent here? Did you test with target that down for a while and didn't see NaN values only in v1.93.14? I did a quick test with vmsingle v1.93.14(vmsingle shares same code with vmagent and vmcluster storage) and the NaN works. My test steps are:
Only v1.93.14 has the issue with registering new timeserries if they have nulls value and hadn't existed before.
You mean the new timeseries start with null values, like the target expose metrics with null values? In my test, the NaN value is attached automatically by vmagent as stale marker.
What the version of vmagent here?
1.93.14
You mean the new timeseries start with null values, like the target expose metrics with null values?
We have process_exporter which is scraped by vmagent and then metrics sent via one more vmagent to two VMs v1.93.14 and v1.87.14
In vmagent's log we see:
2024-04-29T17:56:44.033Z warn VictoriaMetrics/lib/promscrape/scrapework.go:387 cannot scrape target "http://127.0.0.1:9256/metrics" 1 out of 1 times during -promscrape.suppressScrapeErrorsDelay=0s; the last error: the response from "http://127.0.0.1:9256/metrics" exceeds -promscrape.maxScrapeSize=16777216 (the actual response size is 359579335 bytes); either reduce the response size for the target or increase -promscrape.maxScrapeSize
In v1.87.14 i see metric like NaNs in VMUI and like nulls in export: In v1.93.14 I see no inserted metrics but slow insert and cache miss every minute with new scrape cycle.
Look's like related to staleness markers somehow but not sure how to reproduce issue from the scratch. Please check this tcpdump vm_bug.pcap.zip first tcp stream is to v1.93.14 (127.0.0.1) and it's not inserted with 'slow insert' and 'cache miss' every minute second stream to 1.87.14 (10.111.150.2) and it's inserted and has no 'slow insert' and 'cache miss' every minute
Managed to reproduce it with promremotecli - just modified it for sending "staleNaNBits uint64 = 0x7ff0000000000002". So when I send "0x7ff0000000000002" as value for new metric to VM v1.93.14 it doesn't register it, increases slow inserts and cache misses.
VictoriaMetrics should stop creating new time series when it receives staleness marker for new time series
Related to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5069.
In VictoriaMetrics, there are two different NaN, one is called staleNaN [using uint64 = 0x7ff8000000000002] https://github.com/VictoriaMetrics/VictoriaMetrics/blob/d386a68b59ec669ef42cddc0b8fab8145f14ebdd/lib/decimal/decimal.go#L407-L409 the other one is NormalNaN as math.NaN() [using uint64 = 0x7ff8000000000001]
vmagent or vmsingle only generate staleNaN
value when metrics get missing, like target down, see this doc for details.
When scraping target which expose metric like metric1_0{bar="baz"} NaN
, or importing data which contains metric1_0{bar="baz"} null
using /import
APIs, VictoriaMetrics recognizes value like "null", "NaN", "nan", and set them to NormalNaN
instead of staleNaN
.
Then when it comes to store, VictoriaMetrics can tell the difference between staleNaN and NormalNaN, and only store staleNaN values. https://github.com/VictoriaMetrics/VictoriaMetrics/blob/5e8c087d4244a4d82e11c1428e9699d2a00b6cb7/lib/storage/storage.go#L1811-L1816
From the raw samples in v1.87.14, there are consistent NaN values stored in VictoriaMetrics, it could happen when your target expose metircs with "NaN" value and target is flapping up&down(generate staleNaN).
But in v1.93.14, the "NormalNaN" is dropped as always, the "staleNaN" won't be considered as valid value and be dropped as well, so the series won't be registered successfully. But this won't happen if time series has "real" values, at least from time to time. Could you please elaborate your use case here, why store time series only with NaN values?
Could you please elaborate your use case here, why store time series only with NaN values?
We don't need to store time series with only NaN values. We want VM to be fast and stable when it (for some reasons) gets a lot of NaNs. In our case we just switched traffic from one VM cluster to other and got the issue with ingesting performance. We gathered metrics from process_exporter and sent them via vmagent to "old" cluster. When we added to vmagent remotewrite url of "new" cluster with v1.93.14 it started to have performance issues because every minute got millions of NaNs. One more thing that it was difficult to identify cause of the issue. We saw only strange slow inserts and cache misses and in the same time small number of active series and low churn rate.
We don't need to store time series with only NaN values. We want VM to be fast and stable when it (for some reasons) gets a lot of NaNs.
This looks like a very narrow case. In your example, you're trying to ingest StaleNaNs, a reserved type of NaN for staleness detection. VM accepts StaleNaNs only if series for this sample was registered before with any value different from StaleNaN. There is no sense in recording/registering series which contains only StaleNaNs. To verify if series contains only StaleNaNs, VM does cache and index lookup, which is counted as cache miss and slowInsert.
vmagent will create a stale marker in two cases:
either reduce the response size for the target or increase -promscrape.maxScrapeSize
I wasn't able to reproduce vmagent to send stale markers with this error. It is likely something weird is happening to vmagents in your setup. Could you try setting -promscrape.noStaleMarkers
on vmagents side and see if issue can be reproduced?
Describe the bug
Hi, VictoriaMetrics v1.93.14 processes samples with null values in way. When VM receives metrics like
it doesn't insert them (can't see them in export or cardinality explorer in VMUI), doesn't count them in active series but counts them as 'slow inserts' and 'cache misses' (storage/tsid). This is critical issue because it affects performance of the whole ingesting.
Same metrics are processed correctly in v1.87.14 - they are counted in active series, we can access them and no strange 'slow inserts', 'cache misses' and so on.
To Reproduce
Try to insert null values in clean database (so no metrics exists before) to VM v1.93.14
Version
victoria-metrics-20240419-095826-tags-v1.93.14-0-g345a53d8b0
Logs
No response
Screenshots
No response
Used command-line flags
No response
Additional information
No response