apache / couchdb

Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
https://couchdb.apache.org/
Apache License 2.0
6.16k stars 1.02k forks source link

CouchDB Error #5129

Open job-isabai opened 1 month ago

job-isabai commented 1 month ago

Hello, Am running CouchDB on docker container. CouchDB crashes after encountering the error below: `[error] 2024-07-11T07:47:49.418872Z couchdb@127.0.0.1 <0.13560.3> -------- rexi_server: from: couchdb@127.0.0.1(<0.13559.3>) mfa: fabric_rpc:all_docs/3 error:badarg [{erlang,binary_to_term,[<<131,0,104,2,100,0,7,107,112,95,110,111,100,108,0,0,0,3,104,2,109,0,0,0,58,99,114,101,97,116,101,100,58,109,101,100,105,99,45,112,117,114,103,101,100,45,114,111,108,101,45,100,99,54,97,101,102,50,102,53,98,98,97,100,49,55,97,53,49,100,102,51,99,98,102,53,101,101,97,49,48,53,97,104,3,98,72,173,49,215,104,3,97,2,97,0,104,3,100,0,9,115,105,122,101,95,105,110,102,111,97,90,97,6,98,0,0,1,77,104,2,109,0,0,0,34,117,112,100,97,116,101,100,58,109,101,100,105,99,45,117,115,101,114,45,97,109,97,100,111,117,45,107,111,110,45,109,101,116,97,104,3,98,72,194,14,52,104,3,97,48,97,0,104,3,100,0,9,115,105,122,101,95,105,110,102,111,98,0,0,29,91,97,188,98,0,0,147,170,104,2,109,0,0,0,38,117,112,100,97,116,101,100,58,109,101,100,105,99,45,117,115,101,114,45,121,111,117,99,101,102,95,100,97,104,109,97,110,101,45,109,101,116,97,104,3,98,72,191,21,174,104,3,97,23,97,0,104,3,100,0,9,115,105,122,101,95,105,110,102,111,98,0,0,27,63,97,90,98,0,0,93,0,106>>],[{error_info,#{module => erl_erts_errors}}]},{couch_compress,decompress,1,[{file,"src/couch_compress.erl"},{line,65}]},{couch_file,pread_term,2,[{file,"src/couch_file.erl"},{line,156}]},{couch_btree,get_node,2,[{file,"src/couch_btree.erl"},{line,474}]},{couch_btree,stream_node,8,[{file,"src/couch_btree.erl"},{line,1069}]},{couch_btree,fold,4,[{file,"src/couch_btree.erl"},{line,242}]},{couch_bt_engine,fold_docs_int,5,[{file,"src/couch_bt_engine.erl"},{line,1129}]},{couch_mrview,get_total_rows,2,[{file,"src/couch_mrview.erl"},{line,704}]}] [error] 2024-07-11T07:47:49.421107Z couchdb@127.0.0.1 <0.13556.3> -------- could not load validation funs {{badmatch,{error,{badarg,nil,[{erlang,binary_to_term,[<<131,0,104,2,100,0,7,107,112,95,110,111,100,108,0,0,0,3,104,2,109,0,0,0,58,99,114,101,97,116,101,100,58,109,101,100,105,99,45,112,117,114,103,101,100,45,114,111,108,101,45,100,99,54,97,101,102,50,102,53,98,98,97,100,49,55,97,53,49,100,102,51,99,98,102,53,101,101,97,49,48,53,97,104,3,98,72,173,49,215,104,3,97,2,97,0,104,3,100,0,9,115,105,122,101,95,105,110,102,111,97,90,97,6,98,0,0,1,77,104,2,109,0,0,0,34,117,112,100,97,116,101,100,58,109,101,100,105,99,45,117,115,101,114,45,97,109,97,100,111,117,45,107,111,110,45,109,101,116,97,104,3,98,72,194,14,52,104,3,97,48,97,0,104,3,100,0,9,115,105,122,101,95,105,110,102,111,98,0,0,29,91,97,188,98,0,0,147,170,104,2,109,0,0,0,38,117,112,100,97,116,101,100,58,109,101,100,105,99,45,117,115,101,114,45,121,111,117,99,101,102,95,100,97,104,109,97,110,101,45,109,101,116,97,104,3,98,72,191,21,174,104,3,97,23,97,0,104,3,100,0,9,115,105,122,101,95,105,110,102,111,98,0,0,27,63,97,90,98,0,0,93,0,106>>],[{error_info,#{module => erl_erts_errors}}]},{couch_compress,decompress,1,[{file,"src/couch_compress.erl"},{line,65}]},{couch_file,pread_term,2,[{file,"src/couch_file.erl"},{line,156}]},{couch_btree,get_node,2,[{file,"src/couch_btree.erl"},{line,474}]},{couch_btree,stream_node,8,[{file,"src/couch_btree.erl"},{line,1069}]},{couch_btree,fold,4,[{file,"src/couch_btree.erl"},{line,242}]},{couch_bt_engine,fold_docs_int,5,[{file,"src/couch_bt_engine.erl"},{line,1129}]},{couch_mrview,get_total_rows,2,[{file,"src/couch_mrview.erl"},{line,704}]}]}}},[{ddoc_cache_entry_validation_funs,recover,1,[{file,"src/ddoc_cache_entry_validation_funs.erl"},{line,29}]},{ddoc_cache_entry,do_open,1,[{file,"src/ddoc_cache_entry.erl"},{line,275}]}]} [error] 2024-07-11T07:47:49.421587Z couchdb@127.0.0.1 emulator -------- Error in process <0.13557.3> on node 'couchdb@127.0.0.1' with exit value: {{badmatch,{error,{badarg,nil,[{erlang,binary_to_term,[<<131,0,104,2,100,0,7,107,112,95,110,111,100,108,0,0,0,3,104,2,109,0,0,0,58,99,114,101,97,116,101,100,58,109,101,100,105,99,45,112,117,114,103,101,100,45,114,111,108,101,45,100,99,54,97,101,102,50,102,53,98,98,97,100,49,55,97,53,49,100,102,51,99,98,102,53,101,101,97,49,48,53,97,104,3,98,72,173,49,215,104,3,97,2,97,0,104,3,100,0,9,115,105,122,101,95,105,110,102,111,97,90,97,6,98,0,0,1,77,104,2,109,0,0,0,34,117,112,100,97,116,101,100,58,109,101,100,105,99,45,117,115,101,114,45,97,109,97,100,111,117,45,107,111,110,45,109,101,116,97,104,3,98,72,194,14,52,104,3,97,48,97,0,104,3,100,0,9,115,105,122,101,95,105,110,102,111,98,0,0,29,91,97,188,98,0,0,147,170,104,2,109,0,0,0,38,117,112,100,97,116,101,100,58,109,101,100,105,99,45,117,115,101,114,45,121,111,117,99,101,102,95,100,97,104,109,97,110,101,45,109,101,116,97,104,3,98,72,191,21,174,104,3,97,23,97,0,104,3,100,0,9,115,105,122,101,95,105,110,102,111,98,0,0,27,63,97,90,98,0,0,93,0,106>>],[{error_info,#{module => erl_erts_errors}}]},{couch_compress,decompress,1,[{file,"src/couch_compress.erl"},{line,65}]},{couch_file,pread_term,2,[{file,"src/couch_file.erl"},{line,156}]},{couch_btree,get_node,2,[{file,"src/couch_btree.erl"},{line,474}]},{couch_btree,stream_node,8,[{file,"src/couch_btree.erl"},{line,1069}]},{couch_btree,fold,4,[{file,"src/couch_btree.erl"},{line,242}]},{couch_bt_engine,fold_docs_int,5,[{file,"src/couch_bt_engine.erl"},{line,1129}]},{couch_mrview,get_total_rows,2,[{file,"src/couch_mrview.erl"},{line,704}]}]}}},[{ddoc_cache_entry_validation_funs,recover,1,[{file,"src/ddoc_cache_entry_validation_funs.erl"},{line,29}]},{ddoc_cache_entry,do_open,1,[{file,"src/ddoc_cache_entry.erl"},{line,275}]}]}

[error] 2024-07-11T07:47:49.421804Z couchdb@127.0.0.1 emulator -------- Error in process <0.13557.3> on node 'couchdb@127.0.0.1' with exit value: {{badmatch,{error,{badarg,nil,[{erlang,binary_to_term,[<<131,0,104,2,100,0,7,107,112,95,110,111,100,108,0,0,0,3,104,2,109,0,0,0,58,99,114,101,97,116,101,100,58,109,101,100,105,99,45,112,117,114,103,101,100,45,114,111,108,101,45,100,99,54,97,101,102,50,102,53,98,98,97,100,49,55,97,53,49,100,102,51,99,98,102,53,101,101,97,49,48,53,97,104,3,98,72,173,49,215,104,3,97,2,97,0,104,3,100,0,9,115,105,122,101,95,105,110,102,111,97,90,97,6,98,0,0,1,77,104,2,109,0,0,0,34,117,112,100,97,116,101,100,58,109,101,100,105,99,45,117,115,101,114,45,97,109,97,100,111,117,45,107,111,110,45,109,101,116,97,104,3,98,72,194,14,52,104,3,97,48,97,0,104,3,100,0,9,115,105,122,101,95,105,110,102,111,98,0,0,29,91,97,188,98,0,0,147,170,104,2,109,0,0,0,38,117,112,100,97,116,101,100,58,109,101,100,105,99,45,117,115,101,114,45,121,111,117,99,101,102,95,100,97,104,109,97,110,101,45,109,101,116,97,104,3,98,72,191,21,174,104,3,97,23,97,0,104,3,100,0,9,115,105,122,101,95,105,110,102,111,98,0,0,27,63,97,90,98,0,0,93,0,106>>],[{error_info,#{module => erl_erts_errors}}]},{couch_compress,decompress,1,[{file,"src/couch_compress.erl"},{line,65}]},{couch_file,pread_term,2,[{file,"src/couch_file.erl"},{line,156}]},{couch_btree,get_node,2,[{file,"src/couch_btree.erl"},{line,474}]},{couch_btree,stream_node,8,[{file,"src/couch_btree.erl"},{line,1069}]},{couch_btree,fold,4,[{file,"src/couch_btree.erl"},{line,242}]},{couch_bt_engine,fold_docs_int,5,[{file,"src/couch_bt_engine.erl"},{line,1129}]},{couch_mrview,get_total_rows,2,[{file,"src/couch_mrview.erl"},{line,704}]}]}}},[{ddoc_cache_entry_validation_funs,recover,1,[{file,"src/ddoc_cache_entry_validation_funs.erl"},{line,29}]},{ddoc_cache_entry,do_open,1,[{file,"src/ddoc_cache_entry.erl"},{line,275}]}]} [error] 2024-07-11T07:59:28.760469Z couchdb@127.0.0.1 <0.17943.0> -------- rexi_server: from: couchdb@127.0.0.1(<0.17939.0>) mfa: fabric_rpc:all_docs/3 error:badarg [{erlang,binary_to_term,[<<131,0,104,2,100,0,7,107,112,95,110,111,100,108,0,0,0,3,104,2,109,0,0,0,58,99,114,101,97,116,101,100,58,109,101,100,105,99,45,112,117,114,103,101,100,45,114,111,108,101,45,100,99,54,97,101,102,50,102,53,98,98,97,100,49,55,97,53,49,100,102,51,99,98,102,53,101,101,97,49,48,53,97,104,3,98,72,173,49,215,104,3,97,2,97,0,104,3,100,0,9,115,105,122,101,95,105,110,102,111,97,90,97,6,98,0,0,1,77,104,2,109,0,0,0,34,117,112,100,97,116,101,100,58,109,101,100,105,99,45,117,115,101,114,45,97,109,97,100,111,117,45,107,111,110,45,109,101,116,97,104,3,98,72,194,14,52,104,3,97,48,97,0,104,3,100,0,9,115,105,122,101,95,105,110,102,111,98,0,0,29,91,97,188,98,0,0,147,170,104,2,109,0,0,0,38,117,112,100,97,116,101,100,58,109,101,100,105,99,45,117,115,101,114,45,121,111,117,99,101,102,95,100,97,104,109,97,110,101,45,109,101,116,97,104,3,98,72,191,21,174,104,3,97,23,97,0,104,3,100,0,9,115,105,122,101,95,105,110,102,111,98,0,0,27,63,97,90,98,0,0,93,0,106>>],[{error_info,#{module => erl_erts_errors}}]},{couch_compress,decompress,1,[{file,"src/couch_compress.erl"},{line,65}]},{couch_file,pread_term,2,[{file,"src/couch_file.erl"},{line,156}]},{couch_btree,get_node,2,[{file,"src/couch_btree.erl"},{line,474}]},{couch_btree,stream_node,8,[{file,"src/couch_btree.erl"},{line,1069}]},{couch_btree,fold,4,[{file,"src/couch_btree.erl"},{line,242}]},{couch_bt_engine,fold_docs_int,5,[{file,"src/couch_bt_engine.erl"},{line,1129}]},{couch_mrview,get_total_rows,2,[{file,"src/couch_mrview.erl"},{line,704}]}] [error] 2024-07-11T07:59:28.778328Z couchdb@127.0.0.1 emulator -------- Error in process <0.17937.0> on node 'couchdb@127.0.0.1' with exit value: {{badmatch,{error,{badarg,nil,[{erlang,binary_to_term,[<<131,0,104,2,100,0,7,107,112,95,110,111,100,108,0,0,0,3,104,2,109,0,0,0,58,99,114,101,97,116,101,100,58,109,101,100,105,99,45,112,117,114,103,101,100,45,114,111,108,101,45,100,99,54,97,101,102,50,102,53,98,98,97,100,49,55,97,53,49,100,102,51,99,98,102,53,101,101,97,49,48,53,97,104,3,98,72,173,49,215,104,3,97,2,97,0,104,3,100,0,9,115,105,122,101,95,105,110,102,111,97,90,97,6,98,0,0,1,77,104,2,109,0,0,0,34,117,112,100,97,116,101,100,58,109,101,100,105,99,45,117,115,101,114,45,97,109,97,100,111,117,45,107,111,110,45,109,101,116,97,104,3,98,72,194,14,52,104,3,97,48,97,0,104,3,100,0,9,115,105,122,101,95,105,110,102,111,98,0,0,29,91,97,188,98,0,0,147,170,104,2,109,0,0,0,38,117,112,100,97,116,101,100,58,109,101,100,105,99,45,117,115,101,114,45,121,111,117,99,101,102,95,100,97,104,109,97,110,101,45,109,101,116,97,104,3,98,72,191,21,174,104,3,97,23,97,0,104,3,100,0,9,115,105,122,101,95,105,110,102,111,98,0,0,27,63,97,90,98,0,0,93,0,106>>],[{error_info,#{module => erl_erts_errors}}]},{couch_compress,decompress,1,[{file,"src/couch_compress.erl"},{line,65}]},{couch_file,pread_term,2,[{file,"src/couch_file.erl"},{line,156}]},{couch_btree,get_node,2,[{file,"src/couch_btree.erl"},{line,474}]},{couch_btree,stream_node,8,[{file,"src/couch_btree.erl"},{line,1069}]},{couch_btree,fold,4,[{file,"src/couch_btree.erl"},{line,242}]},{couch_bt_engine,fold_docs_int,5,[{file,"src/couch_bt_engine.erl"},{line,1129}]},{couch_mrview,get_total_rows,2,[{file,"src/couch_mrview.erl"},{line,704}]}]}}},[{ddoc_cache_entry_validation_funs,recover,1,[{file,"src/ddoc_cache_entry_validation_funs.erl"},{line,29}]},{ddoc_cache_entry,do_open,1,[{file,"src/ddoc_cache_entry.erl"},{line,275}]}]}

[error] 2024-07-11T07:59:28.780745Z couchdb@127.0.0.1 emulator -------- Error in process <0.17937.0> on node 'couchdb@127.0.0.1' with exit value: {{badmatch,{error,{badarg,nil,[{erlang,binary_to_term,[<<131,0,104,2,100,0,7,107,112,95,110,111,100,108,0,0,0,3,104,2,109,0,0,0,58,99,114,101,97,116,101,100,58,109,101,100,105,99,45,112,117,114,103,101,100,45,114,111,108,101,45,100,99,54,97,101,102,50,102,53,98,98,97,100,49,55,97,53,49,100,102,51,99,98,102,53,101,101,97,49,48,53,97,104,3,98,72,173,49,215,104,3,97,2,97,0,104,3,100,0,9,115,105,122,101,95,105,110,102,111,97,90,97,6,98,0,0,1,77,104,2,109,0,0,0,34,117,112,100,97,116,101,100,58,109,101,100,105,99,45,117,115,101,114,45,97,109,97,100,111,117,45,107,111,110,45,109,101,116,97,104,3,98,72,194,14,52,104,3,97,48,97,0,104,3,100,0,9,115,105,122,101,95,105,110,102,111,98,0,0,29,91,97,188,98,0,0,147,170,104,2,109,0,0,0,38,117,112,100,97,116,101,100,58,109,101,100,105,99,45,117,115,101,114,45,121,111,117,99,101,102,95,100,97,104,109,97,110,101,45,109,101,116,97,104,3,98,72,191,21,174,104,3,97,23,97,0,104,3,100,0,9,115,105,122,101,95,105,110,102,111,98,0,0,27,63,97,90,98,0,0,93,0,106>>],[{error_info,#{module => erl_erts_errors}}]},{couch_compress,decompress,1,[{file,"src/couch_compress.erl"},{line,65}]},{couch_file,pread_term,2,[{file,"src/couch_file.erl"},{line,156}]},{couch_btree,get_node,2,[{file,"src/couch_btree.erl"},{line,474}]},{couch_btree,stream_node,8,[{file,"src/couch_btree.erl"},{line,1069}]},{couch_btree,fold,4,[{file,"src/couch_btree.erl"},{line,242}]},{couch_bt_engine,fold_docs_int,5,[{file,"src/couch_bt_engine.erl"},{line,1129}]},{couch_mrview,get_total_rows,2,[{file,"src/couch_mrview.erl"},{line,704}]}]}}},[{ddoc_cache_entry_validation_funs,recover,1,[{file,"src/ddoc_cache_entry_validation_funs.erl"},{line,29}]},{ddoc_cache_entry,do_open,1,[{file,"src/ddoc_cache_entry.erl"},{line,275}]}]}

[error] 2024-07-11T07:59:28.797215Z couchdb@127.0.0.1 <0.17935.0> -------- could not load validation funs {{badmatch,{error,{badarg,nil,[{erlang,binary_to_term,[<<131,0,104,2,100,0,7,107,112,95,110,111,100,108,0,0,0,3,104,2,109,0,0,0,58,99,114,101,97,116,101,100,58,109,101,100,105,99,45,112,117,114,103,101,100,45,114,111,108,101,45,100,99,54,97,101,102,50,102,53,98,98,97,100,49,55,97,53,49,100,102,51,99,98,102,53,101,101,97,49,48,53,97,104,3,98,72,173,49,215,104,3,97,2,97,0,104,3,100,0,9,115,105,122,101,95,105,110,102,111,97,90,97,6,98,0,0,1,77,104,2,109,0,0,0,34,117,112,100,97,116,101,100,58,109,101,100,105,99,45,117,115,101,114,45,97,109,97,100,111,117,45,107,111,110,45,109,101,116,97,104,3,98,72,194,14,52,104,3,97,48,97,0,104,3,100,0,9,115,105,122,101,95,105,110,102,111,98,0,0,29,91,97,188,98,0,0,147,170,104,2,109,0,0,0,38,117,112,100,97,116,101,100,58,109,101,100,105,99,45,117,115,101,114,45,121,111,117,99,101,102,95,100,97,104,109,97,110,101,45,109,101,116,97,104,3,98,72,191,21,174,104,3,97,23,97,0,104,3,100,0,9,115,105,122,101,95,105,110,102,111,98,0,0,27,63,97,90,98,0,0,93,0,106>>],[{error_info,#{module => erl_erts_errors}}]},{couch_compress,decompress,1,[{file,"src/couch_compress.erl"},{line,65}]},{couch_file,pread_term,2,[{file,"src/couch_file.erl"},{line,156}]},{couch_btree,get_node,2,[{file,"src/couch_btree.erl"},{line,474}]},{couch_btree,stream_node,8,[{file,"src/couch_btree.erl"},{line,1069}]},{couch_btree,fold,4,[{file,"src/couch_btree.erl"},{line,242}]},{couch_bt_engine,fold_docs_int,5,[{file,"src/couch_bt_engine.erl"},{line,1129}]},{couch_mrview,get_total_rows,2,[{file,"src/couch_mrview.erl"},{line,704}]}]}}},[{ddoc_cache_entry_validation_funs,recover,1,[{file,"src/ddoc_cache_entry_validation_funs.erl"},{line,29}]},{ddoc_cache_entry,do_open,1,[{file,"src/ddoc_cache_entry.erl"},{line,275}]}]}`

Please help.

rnewson commented 1 month ago

hi, this looks like data corruption to me. The binary in question is a kp_node but it is somehow truncated or otherwise invalid.

nickva commented 1 month ago

Agree with @rnewson.

<<131,0,104,2,100,0,7,107,112,95...

The first byte 131 looked like a proper initial marker of an uncompressed term.

It's not followed by 80, so it's not compressed https://github.com/apache/couchdb/blob/c93940a66e85d5d9600d17cb38e44f62fd91585b/src/couch/src/couch_compress.erl#L22-L26

0 following 131 seems odd on first look at https://www.erlang.org/doc/apps/erts/erl_ext_dist.html#introduction. The next 104,2 looks like a proper small tuple

Screenshot 2024-07-11 at 5 42 53 PM

Which is probably what we might expect in a kp node.

But turning a tuple into a binary doesn't show a 0 after 131

> erlang:term_to_binary({a, b}).
<<131,104,2,100,0,1,97,100,0,1,98>>

@job-isabai

What version of CouchDB, Erlang, OS, architecture you're running? Wonder if you backed up, or restored the data at any point. Of if there is any way to reproduce the issue?

job-isabai commented 1 month ago

Hello, Thanks for the feedback. I am using CouchDB via Community Health Toolkit (CHT). Docker image can be found here: https://staging.dev.medicmobile.org/_couch/builds_4/medic:medic:4.5.0/docker-compose/cht-couchdb.yml but I can't tell the version. This error emerged from a system upgrade which required all views to be indexed before migration to the new version. This takes place automatically in the backend but there was crash a couple of times during migration that resulted me to revert to a backed up version (Whole Image & Files of the VM). Afterwards, the upgrade was successful, all views were indexed and the system started running on a new version. After a couple of hours this error started popping up, which resulted to CouchDB container restarting unexpectedly. My database size is more than 2GB and growing.

rnewson commented 1 month ago

That image contains CouchDB 3.3.2.

rnewson commented 1 month ago

so I think this is data corruption somehow, you'll need to try earlier backups until something works but we're very curious as to how this might have happened. If you have the details of the storage subsystem (filesystem, disks, any virtualisation between couchdb and the storage device, and any relevant settings on reordering or fsyncing) we'd love to hear them.

job-isabai commented 1 month ago

Actually reverting back to earlier backups might not be an option for me since it has been a month and I might loose the current state of the database. Is there a means of repairing the corrupted data? Can I adjust the configuration to make CouchDB container error tolerant to prevent failure/restart? Running on Ubuntu VM, docker system where everything is stored on the local disk.

rnewson commented 1 month ago

CouchDB is built as a "crash only" system, meaning that the couchdb process is always ready to be killed, there's no shutdown code, no need to call sync manually, etc. When a document is written, and the 200 OK returned, CouchDB has already done everything it can to persist the data to disk (including fsync() calls). At startup CouchDB will read from the end of each file looking for the latest valid header.

Without knowing how the files were corrupted it is hard to know what to recommend and, unfortunately, there are no tools we publish to repair a corrupted .couch file. At best we might be able to build an erlang script that would attempt to extract the document bodies inside the .couch files, though that would be shorn of a number of details (the doc id being the most significant as it is stored in a different location to the body, the corrupted btree index would be able to find it).

Have you perhaps replicated this database elsewhere recently? that could be another source of backup.