RediSearch / RSCoordinator

RediSearch coordinator
Other
24 stars 10 forks source link

Crash when loading the following standalone redisearch v2.0 rdb #285

Open filipecosta90 opened 2 years ago

filipecosta90 commented 2 years ago

Crash when loading the following standalone redisearch rdb: "https://s3.amazonaws.com/benchmarks.redislabs/redisearch/datasets/1M-enwiki_abstract-hashes/1M-enwiki_abstract-hashes.redisearch.rdb"

To trigger it with local benchmark automation (adjust the yaml file to the one in this issue given we've worked around this) :

make benchmark BENCHMARK=ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query.yml

crash log

29969:M 04 Aug 2021 15:15:00.703 # Cluster state changed: ok
2021-08-04 15:15:01 INFO Reusing cached remote file (located at ./datasets/1M-enwiki_abstract-hashes.redisearch.rdb ).
2021-08-04 15:15:01 INFO Copying rdb from ./datasets/1M-enwiki_abstract-hashes.redisearch.rdb to /tmp/tmpsz9els02/cluster-node-port-6379.rdb
2021-08-04 15:15:01 INFO Copying rdb from ./datasets/1M-enwiki_abstract-hashes.redisearch.rdb to /tmp/tmpsz9els02/cluster-node-port-6380.rdb
2021-08-04 15:15:01 INFO Copying rdb from ./datasets/1M-enwiki_abstract-hashes.redisearch.rdb to /tmp/tmpsz9els02/cluster-node-port-6381.rdb
29911:M 04 Aug 2021 15:15:02.046 * Loading RDB produced by version 6.2.5
29911:M 04 Aug 2021 15:15:02.046 * RDB age 86949 seconds
29911:M 04 Aug 2021 15:15:02.046 * RDB memory usage when created 1643.74 Mb

=== REDIS BUG REPORT START: Cut & paste starting from here ===
29911:M 04 Aug 2021 15:15:02.046 # ------------------------------------------------
29911:M 04 Aug 2021 15:15:02.046 # !!! Software Failure. Press left mouse button to continue
29911:M 04 Aug 2021 15:15:02.046 # Guru Meditation: Error loading data from RDB (short read or EOF). Read performed by module 'search' about type 'ft_index0' after reading '0' bytes of a value for key named: '(null)'. #module.c:4614

------ STACK TRACE ------

Backtrace:
redis-server *:6379 [cluster](+0xda348)[0x55a7287ec348]
redis-server *:6379 [cluster](moduleLoadString+0x70)[0x55a7287ec610]
/home/fco/redislabs/RSCoordinator/build/module-oss.so(+0xe1bde)[0x7f1ef50cdbde]
/home/fco/redislabs/RSCoordinator/build/module-oss.so(+0xe58d4)[0x7f1ef50d18d4]
/home/fco/redislabs/RSCoordinator/build/module-oss.so(+0xe5c6a)[0x7f1ef50d1c6a]
redis-server *:6379 [cluster](rdbLoadRio+0x747)[0x55a728791f27]
redis-server *:6379 [cluster](rdbLoad+0x64)[0x55a7287925a4]
redis-server *:6379 [cluster](debugCommand+0x481)[0x55a7287b4091]
redis-server *:6379 [cluster](call+0x9c)[0x55a72875f84c]
redis-server *:6379 [cluster](processCommand+0x66e)[0x55a7287614de]
redis-server *:6379 [cluster](processCommandAndResetClient+0x1c)[0x55a728774e4c]
redis-server *:6379 [cluster](processInputBuffer+0xea)[0x55a7287776ca]
redis-server *:6379 [cluster](+0xfadea)[0x55a72880cdea]
redis-server *:6379 [cluster](aeProcessEvents+0x1b7)[0x55a728757fe7]
redis-server *:6379 [cluster](aeMain+0x1d)[0x55a7287583dd]
redis-server *:6379 [cluster](main+0x30a)[0x55a72875484a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f1ef609abf7]
redis-server *:6379 [cluster](_start+0x2a)[0x55a728754d6a]

------ INFO OUTPUT ------
# Server
redis_version:6.2.5
redis_git_sha1:db09f6eb
redis_git_dirty:0
redis_build_id:ee50a160e9e9283
redis_mode:cluster
os:Linux 5.4.0-77-generic x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:c11-builtin
gcc_version:7.5.0
process_id:29911
process_supervised:no
run_id:def6b24d728ff6731e26099252490df6def610ec
tcp_port:6379
server_time_usec:1628086502046817
uptime_in_seconds:14
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:696549
executable:/home/fco/redislabs/RSCoordinator/tests/benchmarks/redis-server
config_file:
io_threads_active:0

# Clients
connected_clients:14
cluster_connections:4
maxclients:4064
client_recent_max_input_buffer:32
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0

# Memory
used_memory:1652104
used_memory_human:1.58M
used_memory_rss:12820480
used_memory_rss_human:12.23M
used_memory_peak:1652104
used_memory_peak_human:1.58M
used_memory_peak_perc:101.11%
used_memory_overhead:1536216
used_memory_startup:1269856
used_memory_dataset:115888
used_memory_dataset_perc:30.32%
allocator_allocated:1973720
allocator_active:2502656
allocator_resident:5517312
total_system_memory:41721966592
total_system_memory_human:38.86G
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.27
allocator_frag_bytes:528936
allocator_rss_ratio:2.20
allocator_rss_bytes:3014656
rss_overhead_ratio:2.32
rss_overhead_bytes:7303168
mem_fragmentation_ratio:8.16
mem_fragmentation_bytes:11249504
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:266360
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0

# Persistence
loading:1
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1628086488
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0
loading_start_time:1628086502
loading_total_bytes:244957047
loading_rdb_used_mem:1723585672
loading_loaded_bytes:0
loading_loaded_perc:0.00
loading_eta_seconds:1

# Stats
total_connections_received:15
total_commands_processed:5511
instantaneous_ops_per_sec:4
total_net_input_bytes:223115
total_net_output_bytes:28849
instantaneous_input_kbps:0.02
instantaneous_output_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:0
dump_payload_sanitizations:0
total_reads_processed:5471
total_writes_processed:5469
io_threaded_reads_processed:0
io_threaded_writes_processed:0

# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:c7fde38bbf97b5611e6ff6ecd08760481df1f7ce
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

# CPU
used_cpu_sys:0.126399
used_cpu_user:0.312658
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
used_cpu_sys_main_thread:0.117505
used_cpu_user_main_thread:0.314611

# Modules
module:name=search,ver=999999,api=1,filters=0,usedby=[],using=[],options=[]

# Commandstats
cmdstat_cluster:calls=5494,usec=68635,usec_per_call=12.49,rejected_calls=0,failed_calls=0
cmdstat_search.CLUSTERREFRESH:calls=13,usec=2149,usec_per_call=165.31,rejected_calls=0,failed_calls=0
cmdstat_info:calls=3,usec=28,usec_per_call=9.33,rejected_calls=0,failed_calls=0
cmdstat_ping:calls=1,usec=2,usec_per_call=2.00,rejected_calls=0,failed_calls=0

# Errorstats

# Cluster
cluster_enabled:1

# Keyspace

------ CLIENT LIST OUTPUT ------
id=8 addr=127.0.0.1:59856 laddr=127.0.0.1:6379 fd=19 name= age=13 idle=4 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20504 events=r cmd=cluster user=default redir=-1
id=30 addr=127.0.0.1:59892 laddr=127.0.0.1:6379 fd=25 name= age=7 idle=7 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=4 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20488 events=r cmd=NULL user=default redir=-1
id=31 addr=127.0.0.1:59894 laddr=127.0.0.1:6379 fd=27 name= age=7 idle=7 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=4 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20488 events=r cmd=NULL user=default redir=-1
id=32 addr=127.0.0.1:59896 laddr=127.0.0.1:6379 fd=29 name= age=7 idle=7 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=4 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20488 events=r cmd=NULL user=default redir=-1
id=41 addr=127.0.0.1:59948 laddr=127.0.0.1:6379 fd=44 name= age=6 idle=6 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=4 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20488 events=r cmd=NULL user=default redir=-1
id=42 addr=127.0.0.1:59950 laddr=127.0.0.1:6379 fd=45 name= age=6 idle=6 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=4 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20488 events=r cmd=NULL user=default redir=-1
id=43 addr=127.0.0.1:59952 laddr=127.0.0.1:6379 fd=46 name= age=6 idle=6 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=4 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20488 events=r cmd=NULL user=default redir=-1
id=44 addr=127.0.0.1:59954 laddr=127.0.0.1:6379 fd=47 name= age=6 idle=6 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=4 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20488 events=r cmd=NULL user=default redir=-1
id=33 addr=127.0.0.1:59898 laddr=127.0.0.1:6379 fd=31 name= age=7 idle=7 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=4 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20488 events=r cmd=NULL user=default redir=-1
id=34 addr=127.0.0.1:59908 laddr=127.0.0.1:6379 fd=36 name= age=7 idle=7 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=4 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20488 events=r cmd=NULL user=default redir=-1
id=35 addr=127.0.0.1:59910 laddr=127.0.0.1:6379 fd=37 name= age=7 idle=7 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=4 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20488 events=r cmd=NULL user=default redir=-1
id=36 addr=127.0.0.1:59912 laddr=127.0.0.1:6379 fd=38 name= age=7 idle=7 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=4 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20488 events=r cmd=NULL user=default redir=-1
id=37 addr=127.0.0.1:59914 laddr=127.0.0.1:6379 fd=39 name= age=7 idle=7 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=4 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20488 events=r cmd=NULL user=default redir=-1
id=61 addr=127.0.0.1:59970 laddr=127.0.0.1:6379 fd=48 name= age=1 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=39 qbuf-free=40915 argv-mem=17 obl=0 oll=0 omem=0 tot-mem=61481 events= cmd=debug user=default redir=-1

------ CURRENT CLIENT INFO ------
id=61 addr=127.0.0.1:59970 laddr=127.0.0.1:6379 fd=48 name= age=1 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=39 qbuf-free=40915 argv-mem=17 obl=0 oll=0 omem=0 tot-mem=61481 events= cmd=debug user=default redir=-1
argv[0]: 'DEBUG'
argv[1]: 'RELOAD'
argv[2]: 'NOSAVE'

------ MODULES INFO OUTPUT ------

------ FAST MEMORY TEST ------
29911:M 04 Aug 2021 15:15:02.047 # Bio thread for job type #0 terminated
29911:M 04 Aug 2021 15:15:02.048 # Bio thread for job type #1 terminated
29911:M 04 Aug 2021 15:15:02.048 # Bio thread for job type #2 terminated
*** Preparing to test memory region 55a728b22000 (2281472 bytes)
*** Preparing to test memory region 55a729fa8000 (446464 bytes)
*** Preparing to test memory region 7f1ed8000000 (135168 bytes)
*** Preparing to test memory region 7f1edc000000 (135168 bytes)
*** Preparing to test memory region 7f1ee0000000 (135168 bytes)
*** Preparing to test memory region 7f1ee557f000 (2621440 bytes)
*** Preparing to test memory region 7f1ee5800000 (8388608 bytes)
*** Preparing to test memory region 7f1ee6000000 (4194304 bytes)
*** Preparing to test memory region 7f1ee6600000 (8388608 bytes)
*** Preparing to test memory region 7f1ee6e00000 (4194304 bytes)
*** Preparing to test memory region 7f1ee7231000 (8388608 bytes)
*** Preparing to test memory region 7f1ee7a32000 (8388608 bytes)
*** Preparing to test memory region 7f1ee8233000 (8388608 bytes)
*** Preparing to test memory region 7f1ee8a34000 (8388608 bytes)
*** Preparing to test memory region 7f1ee9235000 (8388608 bytes)
*** Preparing to test memory region 7f1ee9a36000 (8388608 bytes)
*** Preparing to test memory region 7f1eea237000 (8388608 bytes)
*** Preparing to test memory region 7f1eeaa38000 (8388608 bytes)
*** Preparing to test memory region 7f1eeb239000 (8388608 bytes)
*** Preparing to test memory region 7f1eeba3a000 (8388608 bytes)
*** Preparing to test memory region 7f1eec23b000 (8388608 bytes)
*** Preparing to test memory region 7f1eeca3c000 (8388608 bytes)
*** Preparing to test memory region 7f1eed23d000 (8388608 bytes)
*** Preparing to test memory region 7f1eeda3e000 (8388608 bytes)
*** Preparing to test memory region 7f1eee23f000 (8388608 bytes)
*** Preparing to test memory region 7f1eeea40000 (8388608 bytes)
*** Preparing to test memory region 7f1eef241000 (8388608 bytes)
*** Preparing to test memory region 7f1eefa42000 (8388608 bytes)
*** Preparing to test memory region 7f1ef0243000 (8388608 bytes)
*** Preparing to test memory region 7f1ef0a44000 (8388608 bytes)
*** Preparing to test memory region 7f1ef1245000 (8388608 bytes)
*** Preparing to test memory region 7f1ef1a46000 (8388608 bytes)
*** Preparing to test memory region 7f1ef2247000 (8388608 bytes)
*** Preparing to test memory region 7f1ef2a48000 (8388608 bytes)
*** Preparing to test memory region 7f1ef3249000 (8388608 bytes)
*** Preparing to test memory region 7f1ef3a4a000 (8388608 bytes)
*** Preparing to test memory region 7f1ef424b000 (8388608 bytes)
*** Preparing to test memory region 7f1ef4fe8000 (16384 bytes)
*** Preparing to test memory region 7f1ef551e000 (12288 bytes)
*** Preparing to test memory region 7f1ef5800000 (8388608 bytes)
*** Preparing to test memory region 7f1ef6466000 (16384 bytes)
*** Preparing to test memory region 7f1ef6685000 (16384 bytes)
*** Preparing to test memory region 7f1ef6b51000 (12288 bytes)
*** Preparing to test memory region 7f1ef777e000 (32768 bytes)
*** Preparing to test memory region 7f1ef77b6000 (4096 bytes)
.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O
Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.

=== REDIS BUG REPORT END. Make sure to include from START to END. ===

       Please report the crash by opening an issue on github:

           http://github.com/redis/redis/issues

  Suspect RAM error? Use redis-server --test-memory to verify it.

benchmark config that caused this:

name: "ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query"
description: "
             - name: enwiki-abstract [details here](https://github.com/RediSearch/ftsb/blob/master/docs/enwiki-abstract-benchmark/description.md), 
                     from English-language Wikipedia:Database page abstracts. 
                     This use case generates 3 TEXT fields per document, and focusses on full text queries performance.
                     Specifically for this testcase:
                      - Type (read/write/mixed): read
                      - Query type: Simple 1 Word Query
                      - Query sample: Abraham
             - total docs: 5.9 million
             - fields per doc: 3 TEXT sortable fields
             - average doc size: 227 bytes
             "
remote:
 - type: oss-standalone
 - setup: redisearch-m5
setups:
  - oss-cluster-3-primaries
dbconfig:
  - dataset: "https://s3.amazonaws.com/benchmarks.redislabs/redisearch/datasets/1M-enwiki_abstract-hashes/1M-enwiki_abstract-hashes.redisearch.rdb"
clientconfig:
  - tool: ftsb_redisearch
  - parameters:
    - workers: 64
    - requests: 50000
    - reporting-period: 1s
    - input: "https://s3.amazonaws.com/benchmarks.redislabs/redisearch/datasets/1M-enwiki_abstract-hashes/1M-enwiki_abstract-hashes.redisearch.commands.BENCH.QUERY_simple-1word-query.csv"
exporter:
  redistimeseries:
    break_by:
      - version
      - commit
    timemetric: "$.StartTime"
    metrics:
      - "$.Tests.OverallRates.overallOpsRate"
      - "$.Tests.OverallQuantiles.allCommands.q50"
      - "$.Tests.OverallQuantiles.allCommands.q95"
      - "$.Tests.OverallQuantiles.allCommands.q99"
      - "$.Tests.OverallQuantiles.allCommands.q999"
ashtul commented 2 years ago

@filipecosta90 Can you please share the .so file or the exact commit/release from which you compiled it?

filipecosta90 commented 2 years ago

@filipecosta90 Can you please share the .so file or the exact commit/release from which you compiled it?

sure :) a96e3212370597c6ab0a2fac1bb0b55c93bced5c

ashtul commented 2 years ago

@oshadmi

29911:M 04 Aug 2021 15:15:02.046 # Guru Meditation: Error loading data from RDB (short read or EOF). Read performed by module 'search' about type 'ft_index0' after reading '0' bytes of a value for key named: '(null)'. #module.c:4614

Can this be related to RediSearch/RediSearch#2058?

oshadmi commented 2 years ago

@oshadmi

29911:M 04 Aug 2021 15:15:02.046 # Guru Meditation: Error loading data from RDB (short read or EOF). Read performed by module 'search' about type 'ft_index0' after reading '0' bytes of a value for key named: '(null)'. #module.c:4614

Can this be related to RediSearch/RediSearch#2058?

@ashtul Could be, I'll look.

oshadmi commented 2 years ago

@ashtul @filipecosta90 I did not merge yet RediSearch/RediSearch#2058 So I assume it is not related to it. Panicking (serverPanic) on short-read is what should not happen after we merge PR 2058. See moduleRDBLoadError in redis.