Points with repeated timestamps overwrites database cyclicaly

chernomor commented 4 years ago

TLDR: this script will fill database with one point when compression using:

TS=`date +%s`; while /bin/true; do echo "test 1 $TS"; done | nc localhost 2003

I expects what repeatly sended points will be ignored or will update points in database.

How to reproduce.

storage-schemas.conf:

[test]
pattern = ^test
retentions = 1m:2d

Fill some points at yesterday:

for i in `seq 1 300`; do echo "test `expr $i % 60` $(expr $i '*' 60  + $( date +%s) - 3600 '*' 24 )" ; done | nc localhost 2003

# go-whisper-dump  test.wsp  | grep 2020 | awk '{print $3 }' | sort | uniq -c
    301 2020-07-01

Send cyclic periods in one interval 10 hours ago:

for nocare in `seq 1 20`; do
  for i in `seq 1 300`; do echo "test `expr $i % 30` $(expr $i '*' 60  + $( date +%s) - 3600 '*' 10 )" ; done
done | nc localhost 2003

At same time we can see as old points will be overwritten:

[root@carbon whisper]# go-whisper-dump  test.wsp  | grep 2020 | awk '{print $3 }' | sort | uniq -c
    301 2020-07-01
   3301 2020-07-02
[root@carbon whisper]# go-whisper-dump  test.wsp  | grep 2020 | awk '{print $3 }' | sort | uniq -c
    301 2020-07-01
   3353 2020-07-02
[root@carbon whisper]# go-whisper-dump  test.wsp  | grep 2020 | awk '{print $3 }' | sort | uniq -c
    301 2020-07-01
   3397 2020-07-02
[root@carbon whisper]# go-whisper-dump  test.wsp  | grep 2020 | awk '{print $3 }' | sort | uniq -c
    301 2020-07-01
   3449 2020-07-02
[root@carbon whisper]# go-whisper-dump  test.wsp  | grep 2020 | awk '{print $3 }' | sort | uniq -c
   3416 2020-07-02
[root@carbon whisper]# go-whisper-dump  test.wsp  | grep 2020 | awk '{print $3 }' | sort | uniq -c
   3755 2020-07-02
[root@carbon whisper]# go-whisper-dump  test.wsp  | grep 2020 | awk '{print $3 }' | sort | uniq -c
   4068 2020-07-02
[root@carbon whisper]# go-whisper-dump  test.wsp  | grep 2020 | awk '{print $3 }' | sort | uniq -c
   4068 2020-07-02

Points with same timestamp duplicates at database:

[root@carbon whisper]# egrep -A2 'block|crc|1593642000' dump
...
00: 1593642000 2020-07-02 01:20:00 - 1593655620 2020-07-02 05:07:00 count:  830 crc32:8a5c7d79 start:271 last_byte:1723 end:1724
04: 1593645720 2020-07-02 02:22:00 - 1593641940 2020-07-02 01:19:00 count:  836 crc32:923b0482 start:6083 last_byte:7535 end:7536
03: 1593649980 2020-07-02 03:33:00 - 1593645660 2020-07-02 02:21:00 count:  829 crc32:92e09b87 start:4630 last_byte:6082 end:6083
02: 1593651420 2020-07-02 03:57:00 - 1593659940 2020-07-02 06:19:00 count:  737 crc32:a4e69008 start:3177 last_byte:4454 end:4630
01: 1593655680 2020-07-02 05:08:00 - 1593651360 2020-07-02 03:56:00 count:  828 crc32:e2885b8a start:1724 last_byte:3176 end:3177

Archive 0 data:
archive 1m:2d block 0 @271
crc32: 8a5c7d79 check: 8a5c7d79 start: 271 end: 1724 length: 1453
     0 1593642000 2020-07-02 01:20:00: 2.000000
     1 1593642060 2020-07-02 01:21:00: 3.000000
     2 1593642120 2020-07-02 01:22:00: 4.000000
--
   300 1593642000 2020-07-02 01:20:00: 2.000000
   301 1593642060 2020-07-02 01:21:00: 3.000000
   302 1593642120 2020-07-02 01:22:00: 4.000000
--
   595 1593642000 2020-07-02 01:20:00: 2.000000
   596 1593659520 2020-07-02 06:12:00: 24.000000
   597 1593659580 2020-07-02 06:13:00: 25.000000
--
archive 1m:2d block 1 @1724
crc32: e2885b8a check: e2885b8a start: 1724 end: 3177 length: 1453
     0 1593655680 2020-07-02 05:08:00: 20.000000
     1 1593655740 2020-07-02 05:09:00: 21.000000
--
    72 1593642000 2020-07-02 01:20:00: 2.000000
    73 1593642060 2020-07-02 01:21:00: 3.000000
    74 1593642120 2020-07-02 01:22:00: 4.000000
--
   370 1593642000 2020-07-02 01:20:00: 2.000000
   371 1593642060 2020-07-02 01:21:00: 3.000000
   372 1593642120 2020-07-02 01:22:00: 4.000000
--
   669 1593642000 2020-07-02 01:20:00: 1.000000
   670 1593642060 2020-07-02 01:21:00: 2.000000
   671 1593642120 2020-07-02 01:22:00: 3.000000
--
archive 1m:2d block 2 @3177
crc32: a4e69008 check: a4e69008 start: 3177 end: 4454 length: 1277
     0 1593651420 2020-07-02 03:57:00: 8.000000
     1 1593651480 2020-07-02 03:58:00: 9.000000
--
   138 1593642000 2020-07-02 01:20:00: 1.000000
   139 1593642060 2020-07-02 01:21:00: 2.000000
   140 1593642120 2020-07-02 01:22:00: 3.000000
--
   431 1593642000 2020-07-02 01:20:00: 1.000000
   432 1593642060 2020-07-02 01:21:00: 2.000000
   433 1593642120 2020-07-02 01:22:00: 3.000000
--
archive 1m:2d block 3 @4630
crc32: 92e09b87 check: 92e09b87 start: 4630 end: 6083 length: 1453
     0 1593649980 2020-07-02 03:33:00: 15.000000
     1 1593650040 2020-07-02 03:34:00: 16.000000
--
   167 1593642000 2020-07-02 01:20:00: 2.000000
   168 1593642060 2020-07-02 01:21:00: 3.000000
   169 1593642120 2020-07-02 01:22:00: 4.000000
--
   462 1593642000 2020-07-02 01:20:00: 2.000000
   463 1593642060 2020-07-02 01:21:00: 3.000000
   464 1593642120 2020-07-02 01:22:00: 4.000000
--
   763 1593642000 2020-07-02 01:20:00: 2.000000
   764 1593642060 2020-07-02 01:21:00: 3.000000
   765 1593642120 2020-07-02 01:22:00: 4.000000
--
archive 1m:2d block 4 @6083
crc32: 923b0482 check: 923b0482 start: 6083 end: 7536 length: 1453
     0 1593645720 2020-07-02 02:22:00: 4.000000
     1 1593645780 2020-07-02 02:23:00: 5.000000
--
   236 1593642000 2020-07-02 01:20:00: 2.000000
   237 1593642060 2020-07-02 01:21:00: 3.000000
   238 1593642120 2020-07-02 01:22:00: 4.000000
--
   534 1593642000 2020-07-02 01:20:00: 2.000000
   535 1593642060 2020-07-02 01:21:00: 3.000000
   536 1593642120 2020-07-02 01:22:00: 4.000000

deniszh commented 4 years ago

Hi @chernomor ,

Thanks for report! But you didn't mentioned what's running on localhost:2003 - what software, which version and with which config. It's quite hard to reproduce without that information.

Thanks!

chernomor commented 4 years ago

Hi @deniszh, This is go-carbon (commit https://github.com/lomik/go-carbon/commit/7b66f400678ab5071a32b6fbe66f4c79bd41d43c). Also I import current master from go-whisper to https://github.com/lomik/go-carbon/tree/master/persister - no changes.

Civil commented 4 years ago

@chernomor go-carbon's config (I mean go-carbon.conf) could be important as well as there are several ways how you can configure it.

chernomor commented 4 years ago

go-carbon.conf

[common]
user = "carbon"
graph-prefix = "carbon.agents.{host}"
metric-endpoint = "local"
metric-interval = "1m0s"
max-cpu = 4

[whisper]
data-dir = "/var/lib/graphite/whisper"
schemas-file = "/etc/go-carbon/storage-schemas.conf"
aggregation-file = "/etc/go-carbon/storage-aggregation.conf"
workers = 8
max-updates-per-second = 0
max-creates-per-second = 0
hard-max-creates-per-second = false
sparse-create = false
flock = true
enabled = true
hash-filenames = true
compressed = true
remove-empty-file = false

[cache]
max-size = 10
write-strategy = "max"

[udp]
listen = ":2003"
enabled = true
buffer-size = 0

[tcp]
listen = ":2003"
enabled = true
buffer-size = 0

[pickle]
listen = ":2004"
max-message-size = 67108864
enabled = true
buffer-size = 0

[carbonlink]
listen = "127.0.0.1:7002"
enabled = true
read-timeout = "30s"

[grpc]
listen = "127.0.0.1:7003"
enabled = true

[tags]
enabled = false
tagdb-url = "http://127.0.0.1:8000"
tagdb-chunk-size = 32
tagdb-update-interval = 100
local-dir = "/var/lib/graphite/tagging/"
tagdb-timeout = "1s"

[carbonserver]
listen = ":8080"
enabled = true
buckets = 10
metrics-as-counters = false
read-timeout = "60s"
write-timeout = "60s"
query-cache-enabled = false
query-cache-size-mb = 0
find-cache-enabled = false
trigram-index = true
scan-frequency = "5m0s"
trie-index = false

max-globs = 100
fail-on-max-globs = false

max-metrics-globbed  = 30000
max-metrics-rendered = 1000

graphite-web-10-strict-mode = true
internal-stats-dir = ""
stats-percentiles = [99, 98, 95, 75, 50]

[dump]
enabled = false
path = "/var/lib/graphite/dump/"
restore-per-second = 0

[pprof]
listen = "localhost:7007"
enabled = false

[[logging]]
logger = ""
file = "/var/log/go-carbon/go-carbon.log"
level = "info"
encoding = "mixed"
encoding-time = "iso8601"
encoding-duration = "seconds"

storage-aggregation.conf

[test]
pattern = \.test$
xFilesFactor = 0
aggregationMethod = average

[default]
pattern = .*
xFilesFactor = 0.5
aggregationMethod = average

storage-schemas.conf

[test]
pattern = ^test
retentions = 1m:2d

[default]
pattern = .*
retentions = 60s:30d,1h:5y

Civil commented 4 years ago

Thanks! Key thing here is that you are using compressed whisper, and maybe @bom-d-van would be the best person to have a look at your issue.

chernomor commented 4 years ago

@Civil yes, without compression this strange case (with repeated points) works fine :)

bom-d-van commented 4 years ago

@chernomor thanks for the report! (also thanks @deniszh and @Civil for the discussion) Will take a look on the issue soon.

bom-d-van commented 4 years ago

@chernomor the issue is some what an edge case in the compressed whisper. In the current design, compressed whisper doesn't support data rewrite like standard whisper file. The check in the code that we had at the moment only guards against files with at least 2 levels of retentions policy (like 1s:2d,1m:30d).

For single level retention policy (like 1m:2d), it happily saves the data but I think it might have some issues in the read path. I will make a fix to address it. We have two solutions for this case:

Allow duplicated data to be saved in compressed whisper file for single level retention policy. This way users can overwrite data (within a range). The tradeoff is extra space (also potentially lead to not enough space for saving specified retention range).
Only save the first data points, discard data with old timestamps.

I haven't decide which way to go. Let me know if you think one is better than another.

chernomor commented 4 years ago

@bom-d-van, with second solution cwhisper will save only points with larger timestamp than last saved? With first solution it's not guaranted what all points will be saved at whole retention period, is it true?

chernomor commented 4 years ago

@bom-d-van I think, discard data is most reliable way - because old points will not be lost (it's most important goal at my opinion). Rewrite may be done manualy by sending needed data as new metric and replace old file by the new.

go-graphite / go-whisper

Points with repeated timestamps overwrites database cyclicaly #20