RediSearch / RediSearch

A query and indexing engine for Redis, providing secondary indexing, full-text search, vector similarity search and aggregations.
https://redis.io/docs/stack/search/
Other
5.43k stars 517 forks source link

Redis High CPU Usage (100%) After Long-Running Application (More Than One to Two Weeks); CPU Drops When Index Is Dropped and Recreated #4929

Open Datta19 opened 1 month ago

Datta19 commented 1 month ago

We are experiencing high CPU usage (100%) in Redis for long-running applications (more than one to two weeks). The CPU usage drops when we drop and recreate the index.

Environment:

Issue: In our production environment, across 3-4 customer sites, we are facing an issue where Redis CPU usage spikes to 100%. This does not happen immediately but only after the application has been up and running for over one to two weeks.

When the issue occurs, we conducted some experiments and made the following observations:

  1. There were only around 1,000 keys and 50 active clients in Redis (verified using RedisInsight).
  2. We stopped all applications connected to Redis, but the CPU usage did not decrease, even after monitoring for 10 minutes.
  3. The "FT.SEARCH idx:ServerData" command appeared in Grafana's slow log, despite there being no "ServerData" in Redis, as all connected applications had been stopped.
  4. We dropped the index "idx:ServerData" and observed the CPU usage for 10 minutes. It dropped to a stable range of 3% to 5%.
  5. We recreated the index "idx:ServerData", restarted the applications connected to Redis, and monitored Redis for 10 minutes. The CPU usage remained stable at 3% to 5%, even when the applications started loading data into Redis.

Can anyone help identify if there is a configuration issue with Redis that we might be missing? Any suggestions or insights would be greatly appreciated.

Grafana INFO-OUTPUT.txt RedisCpuIssue Redis-insight

raz-mon commented 3 weeks ago

Hi @Datta19, thanks for reaching out!

Interesting issue, let's try to understand what happened.

Can you please share:

  1. The output of FT.INFO idx:ServerData.
  2. The queries related to RediSearch you see in the slowlog, in full. Note that they remain there even when you disconnect the clients, that explains 3. What are the typical RediSearch commands used?
  3. Does the application have many deletions and writes? How much approximately.
  4. Since this quite a strange situation, please share any further context\data so we can better understand your use-case. For instance, if you can share a masked rdb together with some of the queries you used before getting experiencing this issue, we can take a look at that as well.
Datta19 commented 3 weeks ago

Hi @raz-mon thanks for the reply please go through the below information. i will provide the information for 3 & 4 after some code analysis from my side

The output of FT.INFO idx:ServerData.

ft.info idx:ServerData 1) "index_name" 2) "idx:ServerData" 3) "index_options" 4) (empty list or set) 5) "index_definition" 6) 1) "key_type" 2) "JSON" 3) "prefixes" 4) 1) "ServerData" 5) "default_score" 6) "1" 7) "attributes" 8) 1) 1) "identifier" 2) "$.projectKey" 3) "attribute" 4) "projectKey" 5) "type" 6) "TEXT" 7) "WEIGHT" 8) "1" 2) 1) "identifier" 2) "$.status" 3) "attribute" 4) "status" 5) "type" 6) "TEXT" 7) "WEIGHT" 8) "1" 3) 1) "identifier" 2) "$.serverName" 3) "attribute" 4) "serverName" 5) "type" 6) "TEXT" 7) "WEIGHT" 8) "1" 4) 1) "identifier" 2) "$.serverId" 3) "attribute" 4) "serverId" 5) "type" 6) "NUMERIC" 5) 1) "identifier" 2) "$.processId" 3) "attribute" 4) "processId" 5) "type" 6) "NUMERIC" 6) 1) "identifier" 2) "$.restartTimes" 3) "attribute" 4) "restartTimes" 5) "type" 6) "NUMERIC" 7) 1) "identifier" 2) "$.licenseFeature" 3) "attribute" 4) "licenseFeature" 5) "type" 6) "TEXT" 7) "WEIGHT" 8) "1" 8) 1) "identifier" 2) "$.previousServersId[0:]" 3) "attribute" 4) "previousServersId" 5) "type" 6) "NUMERIC" 9) "num_docs" 10) "0" 11) "max_doc_id" 12) "608" 13) "num_terms" 14) "0" 15) "num_records" 16) "0" 17) "inverted_sz_mb" 18) "0" 19) "vector_index_sz_mb" 20) "0" 21) "total_inverted_index_blocks" 22) "118" 23) "offset_vectors_sz_mb" 24) "0.00286865234375" 25) "doc_table_size_mb" 26) "0" 27) "sortable_values_size_mb" 28) "0" 29) "key_table_size_mb" 30) "0" 31) "records_per_doc_avg" 32) "-nan" 33) "bytes_per_record_avg" 34) "-nan" 35) "offsets_per_term_avg" 36) "inf" 37) "offset_bits_per_record_avg" 38) "8" 39) "hash_indexing_failures" 40) "0" 41) "total_indexing_time" 42) "30.905000000000001" 43) "indexing" 44) "0" 45) "percent_indexed" 46) "1" 47) "number_of_uses" 48) "45627" 49) "gc_stats" 50) 1) "bytes_collected" 2) "22874" 3) "total_ms_run" 4) "11" 5) "total_cycles" 6) "2" 7) "average_cycle_time_ms" 8) "5.5" 9) "last_run_time_ms" 10) "3" 11) "gc_numeric_trees_missed" 12) "0" 13) "gc_blocks_denied" 14) "0" 51) "cursor_stats" 52) 1) "global_idle" 2) "0" 3) "global_total" 4) "0" 5) "index_capacity" 6) "128" 7) "index_total" 8) "0" 53) "dialect_stats" 54) 1) "dialect_1" 2) "1" 3) "dialect_2" 4) "0" 5) "dialect_3" 6) "0"

The queries related to RediSearch you see in the slowlog, in full. Note that they remain there even when you disconnect the clients, that explains 3. What are the typical RediSearch commands used?

jsonGet(key, clazz) jsonGet(key, paths) ftSearch(indexName, query) jsonSet(key, Path.ROOT_PATH, pojo) jsonDel(key) publish(channel, message)

Datta19 commented 3 weeks ago

Hi @raz-mon do you have any comment or suggestion on provide information

raz-mon commented 3 weeks ago

Hi @Datta19, Hard to say from the data in hand. If you have more details to provide we may be able to detect something. What is the query you're dispatching in ftSearch(indexName, query)? I assume the * is for testing?

Also notice that you're using a relatively old version, consider upgrading, as a lot of fixes and enhancements were introduced since.