bitwiseworks / qtwebengine-chromium-os2

Port of Chromium and related tools to OS/2
9 stars 2 forks source link

net/disk_cache: Failed to save user data and other errors #23

Closed dmik closed 4 years ago

dmik commented 4 years ago

This is a very popular error (both release and debug):

[6077:17:1008/225327.140000:ERROR:backend_impl.cc(1063)] Critical error found -8
...
[4916:27:1008/223156.025000:ERROR:entry_impl.cc(986)] Failed to save user data

The debug build shows some more errors that look related:

[6077:5:1008/225327.212000:WARNING:http_cache_transaction.cc(1276)] Unable to create cache entry
[6077:5:1008/225327.228000:WARNING:http_cache_transaction.cc(1276)] Unable to create cache entry
[6077:5:1008/225327.232000:WARNING:http_cache_transaction.cc(1276)] Unable to create cache entry
[6077:17:1008/225327.718000:ERROR:entry_impl.cc(986)] Failed to save user data
[6077:17:1008/225327.720000:ERROR:entry_impl.cc(986)] Failed to save user data
[6077:17:1008/225328.023000:WARNING:storage_block-inl.h(143)] Failed data load.
[6077:17:1008/225328.023000:WARNING:storage_block-inl.h(143)] Failed data load.
[6077:17:1008/225328.052000:WARNING:storage_block-inl.h(143)] Failed data load.

Not clear what's going on, needs deeper debugging.

Needed for https://github.com/bitwiseworks/qtwebengine-os2/issues/6.

dmik commented 4 years ago

Enabling disk cache checks here: https://github.com/bitwiseworks/qtwebengine-chromium-os2/blob/master/chromium/net/disk_cache/blockfile/stress_support.h demonstrates all kinds of problems. E.g.:

...
[1662:20:1103/024917.472000:INFO:trace.cc(125)] Open hash 0x7c83d269
[1662:20:1103/024917.472000:INFO:trace.cc(125)] Create hash 0x7c83d269
[1662:20:1103/024917.472000:INFO:trace.cc(125)] CreateBlock 0xa001000a
[1662:20:1103/024917.472000:INFO:trace.cc(125)] CreateBlock 0x90000002
[1662:20:1103/024917.472000:INFO:trace.cc(125)] Create entry In
[1662:20:1103/024917.472000:INFO:trace.cc(125)] Create Entry  0x0x54bb5e40 0xa001000a 0x900000
[1662:20:1103/024917.472000:INFO:trace.cc(125)]   data: 0x0 0x0 0x0
[1662:20:1103/024917.472000:INFO:trace.cc(125)]   doomed: 0 0x1
[1662:20:1103/024917.472000:INFO:trace.cc(125)] Insert 0x90000002 l 0
[1662:20:1103/024917.474000:INFO:trace.cc(125)] create entry hit
[1662:20:1103/024919.436000:INFO:trace.cc(125)] Doom entry 0x0x54bb5e40
[1662:20:1103/024919.436000:INFO:trace.cc(125)] Remove 0x90000002 (0x90000001 0x90000002) l 0
[1662:20:1103/024919.438000:FATAL:backend_impl.cc(1062)] Check failed: false.

with this stack trace:

 0559ED64  BCD8BC95   QT5WEBCD  0001:0A32BC95   __ZN10disk_cache11BackendImpl13CriticalErrorEi + 65 0001:0A32BC30 (backend_impl.o)
 0559EEA4  BDFE7FD3   QT5WEBCD  0001:0B587FD3   __ZN10disk_cache8Rankings15UpdateIteratorsEPNS_12StorageBlockINS_12RankingsNodeEEE + 363 0001:0B587C70 (rankings.o)
 0559EFF4  BDFE8BCC   QT5WEBCD  0001:0B588BCC   __ZN10disk_cache8Rankings6RemoveEPNS_12StorageBlockINS_12RankingsNodeEEENS0_4ListEb + 2EC 0001:0B5888E0 (rankings.o)
 0559F154  BE2C27E0   QT5WEBCD  0001:0B8627E0   __ZN10disk_cache8Eviction13OnDoomEntryV2EPNS_9EntryImplE + 70 0001:0B862770 (eviction.o)
 0559F194  BCD92542   QT5WEBCD  0001:0A332542   __ZN10disk_cache11BackendImpl17InternalDoomEntryEPNS_9EntryImplE + 192 0001:0A3323B0 (backend_impl.o)
 0559F214  BD9DC0DA   QT5WEBCD  0001:0AF7C0DA   __ZN10disk_cache9EntryImpl8DoomImplEv + 15A 0001:0AF7BF80 (entry_impl.o)
 0559F324  BDCCEFE3   QT5WEBCD  0001:0B26EFE3   __ZN10disk_cache9BackendIO23ExecuteBackendOperationEv + 153 0001:0B26EE90 (in_flight_backend_io.o)
 0559F454  BDCCD8EB   QT5WEBCD  0001:0B26D8EB   __ZN4base8internal7InvokerINS0_9BindStateIMN10disk_cache9BackendIOEFvvEJ13scoped_refptrIS4_EEEEFvvEE3RunEPNS0_13BindStateBaseE$w$Z91HbPLlrP7A6JXG1 + 3B 0001:0B26D8B0 (in_flight_backend_io.o)
 0559F564  B859C7F6   QT5WEBCD  0001:05B3C7F6   __ZN4base5debug13TaskAnnotator7RunTaskEPKcPNS_11PendingTaskE + E6 0001:05B3C710 (base_jumbo_4.o)
 0559F6E4  B840DE24   QT5WEBCD  0001:059ADE24   __ZN4base15MessageLoopImpl7RunTaskEPNS_11PendingTaskE + C4 0001:059ADD60 (base_jumbo_9.o)
 0559F864  B84128A8   QT5WEBCD  0001:059B28A8   __ZN4base15MessageLoopImpl21DeferOrRunPendingTaskENS_11PendingTaskE + 28 0001:059B2880 (base_jumbo_9.o)
 0559F8C4  B8412B34   QT5WEBCD  0001:059B2B34   __ZN4base15MessageLoopImpl6DoWorkEv + 104 0001:059B2A30 (base_jumbo_9.o)
 0559F994  B84BC780   QT5WEBCD  0001:05A5C780   __ZN4base19MessagePumpLibevent3RunEPNS_11MessagePump8DelegateE + 40 0001:05A5C740 (base_jumbo_40.o)
 0559F9F4  B8407BC0   QT5WEBCD  0001:059A7BC0   __ZN4base15MessageLoopImpl3RunEb + 50 0001:059A7B70 (base_jumbo_9.o)
 0559FB04  B85CF498   QT5WEBCD  0001:05B6F498   __ZN4base7RunLoop3RunEv + 128 0001:05B6F370 (base_jumbo_16.o)
 0559FC74  B84DCE35   QT5WEBCD  0001:05A7CE35   __ZN4base6Thread3RunEPNS_7RunLoopE + A5 0001:05A7CD90 (base_jumbo_26.o)
 0559FDA4  B84E6CD3   QT5WEBCD  0001:05A86CD3   __ZN4base6Thread10ThreadMainEv + 363 0001:05A86970 (base_jumbo_26.o)

Or this:

[13528:16:1103/215244.725000:INFO:trace.cc(125)] Open hash 0xed2ef5dc
[13528:16:1103/215244.726000:INFO:trace.cc(125)] Changing list 0 to 1
[13528:16:1103/215244.726000:INFO:trace.cc(125)] Open hash 0xed2ef5dc end: 0xa0010002
[13528:16:1103/215244.726000:INFO:trace.cc(125)] Remove 0x90000000 (0x90000000 0x90000001) l 1
[13528:16:1103/215244.726000:INFO:trace.cc(125)] Changing list 0 to 1
[13528:16:1103/215244.726000:INFO:trace.cc(125)] Changing list 0 to 1
[13528:16:1103/215244.726000:INFO:trace.cc(125)] CheckLinks 0x90000000 (0x90000000 0x90000001)
[13528:16:1103/215244.726000:INFO:trace.cc(125)] Insert 0x90000000 l 1
[13528:16:1103/215244.726000:INFO:trace.cc(125)] Changing list 0 to 1
[13528:16:1103/215244.889000:INFO:trace.cc(125)] To PrepareTarget 0xa0010002
[13528:16:1103/215244.889000:INFO:trace.cc(125)] DeleteBlock 0xa1010004
[13528:16:1103/215244.889000:INFO:trace.cc(125)] From PrepareTarget 0xa0010002
[13528:16:1103/215244.924000:INFO:trace.cc(125)] ~EntryImpl in 0x0x22e04040 0xa0010002 0x9000000
[13528:16:1103/215244.924000:INFO:trace.cc(125)]   data: 0x0 0xb1020000 0x0
[13528:16:1103/215244.924000:INFO:trace.cc(125)]   doomed: 0 0x2
[13528:16:1103/215244.924000:INFO:trace.cc(125)] CreateBlock 0xa101000a
[13528:16:1103/215244.924000:INFO:trace.cc(125)] ~EntryImpl out 0x0x22e04040
[13528:12:1103/215245.017000:ERROR:process_metrics_os2.cc(19)] Not implemented reached in static std::unique_ptr<base::ProcessMetrics> base::ProcessMetrics::CreateProcessMetrics(base::ProcessHandle)
[13528:16:1103/215245.078000:INFO:trace.cc(125)] Open hash 0x3aea252e
[13528:16:1103/215245.078000:WARNING:backend_impl.cc(1620)] Messed up entry found.
[13528:16:1103/215245.078000:FATAL:backend_impl.cc(1621)] Check failed: false. 

with this stack trace:

 04D6ED94  BCD8EF61   QT5WEBCD  0001:0A32EF61   __ZN10disk_cache11BackendImpl8NewEntryENS_4AddrEP13scoped_refptrINS_9EntryImplEE + AE1 0001:0A32E480 (backend_impl.o)
 04D6EF04  BCD900B5   QT5WEBCD  0001:0A3300B5   __ZN10disk_cache11BackendImpl10MatchEntryERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEjbNS_4AddrEPb + 1B5 0001:0A32FF00 (backend_impl.o)
 04D6F084  BCD91829   QT5WEBCD  0001:0A331829   __ZN10disk_cache11BackendImpl13OpenEntryImplERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE + B9 0001:0A331770 (backend_impl.o)
 04D6F204  BCD9216D   QT5WEBCD  0001:0A33216D   __ZN10disk_cache11BackendImpl13SyncOpenEntryERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEP13scoped_refptrINS_9EntryImplEE + 3D 0001:0A332130 (backend_impl.o)
 04D6F324  BDCCF446   QT5WEBCD  0001:0B26F446   __ZN10disk_cache9BackendIO23ExecuteBackendOperationEv + 5B6 0001:0B26EE90 (in_flight_backend_io.o)
 04D6F454  BDCCD8EB   QT5WEBCD  0001:0B26D8EB   __ZN4base8internal7InvokerINS0_9BindStateIMN10disk_cache9BackendIOEFvvEJ13scoped_refptrIS4_EEEEFvvEE3RunEPNS0_13BindStateBaseE$w$Z91HbPLlrP7A6JXG1 + 3B 0001:0B26D8B0 (in_flight_backend_io.o)
 04D6F564  B859C7F6   QT5WEBCD  0001:05B3C7F6   __ZN4base5debug13TaskAnnotator7RunTaskEPKcPNS_11PendingTaskE + E6 0001:05B3C710 (base_jumbo_4.o)
 04D6F6E4  B840DE24   QT5WEBCD  0001:059ADE24   __ZN4base15MessageLoopImpl7RunTaskEPNS_11PendingTaskE + C4 0001:059ADD60 (base_jumbo_9.o)
 04D6F864  B84128A8   QT5WEBCD  0001:059B28A8   __ZN4base15MessageLoopImpl21DeferOrRunPendingTaskENS_11PendingTaskE + 28 0001:059B2880 (base_jumbo_9.o)
 04D6F8C4  B8412B34   QT5WEBCD  0001:059B2B34   __ZN4base15MessageLoopImpl6DoWorkEv + 104 0001:059B2A30 (base_jumbo_9.o)
 04D6F994  B84BC780   QT5WEBCD  0001:05A5C780   __ZN4base19MessagePumpLibevent3RunEPNS_11MessagePump8DelegateE + 40 0001:05A5C740 (base_jumbo_40.o)
 04D6F9F4  B8407BC0   QT5WEBCD  0001:059A7BC0   __ZN4base15MessageLoopImpl3RunEb + 50 0001:059A7B70 (base_jumbo_9.o)
 04D6FB04  B85CF498   QT5WEBCD  0001:05B6F498   __ZN4base7RunLoop3RunEv + 128 0001:05B6F370 (base_jumbo_16.o)
 04D6FC74  B84DCE35   QT5WEBCD  0001:05A7CE35   __ZN4base6Thread3RunEPNS_7RunLoopE + A5 0001:05A7CD90 (base_jumbo_26.o)
 04D6FDA4  B84E6CD3   QT5WEBCD  0001:05A86CD3   __ZN4base6Thread10ThreadMainEv + 363 0001:05A86970 (base_jumbo_26.o)

I have absolutely no idea how this cache works. I have to learn it seems and/or build and run unit tests for this part. Sounds like a lot of work.

dmik commented 4 years ago

In the debug I also often see this crash:

[1550:12:1104/214528.151000:FATAL:local_storage_cached_area.cc(584)] Check failed: found != ignore_key_mutations_.end().

with the stack trace:

 0440E4A4  BB33DDF2   QT5WEBCD  0001:088DDDF2   content::LocalStorageCachedArea::OnSetItemComplete(std::__cxx11::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> > const&, blink::WebScopedVirtualTimePauser, bool) + 102 0001:088DDCF0 (renderer_jumbo_6.o)
 0440E5B4  BB33954A   QT5WEBCD  0001:088D954A   __ZN4base8internal7InvokerINS0_9BindStateIMN7content22LocalStorageCachedAreaEFvRKNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEEEN5blink26WebScopedVirtualTimePauserEbEJNS_7WeakPtrIS4_EESA_SE_EEEFvbEE7RunOnceEPNS0_13BindStat!_iT1uu3$w$T6pxa0c0tP7m0doS0 + CA 0001:088D9480 (renderer_jumbo_6.o)
 0440E6F4  B3E14377   QT5WEBCD  0001:013B4377   blink::mojom::StorageArea_Put_ForwardToCallback::Accept(mojo::Message*) + 97 0001:013B42E0 (mojom_platform_jumbo_6.o)
 0440E844  B90F28D0   QT5WEBCD  0001:066928D0   mojo::InterfaceEndpointClient::HandleValidatedMessage(mojo::Message*) + 650 0001:06692280 (interface_endpoint_client.o)
 0440E994  BCE83745   QT5WEBCD  0001:0A423745   mojo::FilterChain::Accept(mojo::Message*) + 85 0001:0A4236C0 (filter_chain.o)
 0440EAB4  B90F218D   QT5WEBCD  0001:0669218D   mojo::InterfaceEndpointClient::HandleIncomingMessage(mojo::Message*) + 3D 0001:06692150 (interface_endpoint_client.o)
 0440EBC4  B8717312   QT5WEBCD  0001:05CB7312   mojo::internal::MultiplexRouter::ProcessIncomingMessage(mojo::internal::MultiplexRouter::MessageWrapper*, mojo::internal::MultiplexRouter::ClientCallBehavior, base::SequencedTaskRunner*) + 5D2 0001:05CB6D40 (multiplex_router.o)
 0440ED44  B8717AAA   QT5WEBCD  0001:05CB7AAA   mojo::internal::MultiplexRouter::ProcessIncomingMessage(mojo::internal::MultiplexRouter::MessageWrapper*, mojo::internal::MultiplexRouter::ClientCallBehavior, base::SequencedTaskRunner*) + D6A 0001:05CB6D40 (multiplex_router.o)
 0440EEA4  B871A0AD   QT5WEBCD  0001:05CBA0AD   mojo::internal::MultiplexRouter::LockAndCallProcessTasks() + CD 0001:05CB9FE0 (multiplex_router.o)
 0440EFE4  B870EB6B   QT5WEBCD  0001:05CAEB6B   __ZN4base8internal7InvokerINS0_9BindStateIMN4mojo8internal15MultiplexRouterEFvvEJ13scoped_refptrIS5_EEEEFvvEE7RunOnceEPNS0_13BindStateBaseE$w$FT9f29cbSN7IdN8r0 + 3B 0001:05CAEB30 (multiplex_router.o)
 0440F0F4  B859C7F6   QT5WEBCD  0001:05B3C7F6   base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) + E6 0001:05B3C710 (base_jumbo_4.o)
 0440F274  B84417E8   QT5WEBCD  0001:059E17E8   base::sequence_manager::internal::ThreadControllerImpl::DoWork(base::sequence_manager::internal::ThreadControllerImpl::WorkType) + 238 0001:059E15B0 (base_jumbo_21.o)
 0440F464  B841EF7C   QT5WEBCD  0001:059BEF7C   __ZN4base8internal7InvokerINS0_9BindStateIMNS_16sequence_manager8internal20ThreadControllerImplEFvNS5_8WorkTypeEEJNS_7WeakPtrIS5_EES6_EEEFvvEE3RunEPNS0_13BindStateBaseE$w$7sSiXp48bP7Jtkx60 + 8C 0001:059BEEF0 (base_jumbo_21.o)
 0440F584  B859C7F6   QT5WEBCD  0001:05B3C7F6   base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) + E6 0001:05B3C710 (base_jumbo_4.o)
 0440F704  B840DE24   QT5WEBCD  0001:059ADE24   base::MessageLoopImpl::RunTask(base::PendingTask*) + C4 0001:059ADD60 (base_jumbo_9.o)
 0440F884  B84128A8   QT5WEBCD  0001:059B28A8   base::MessageLoopImpl::DeferOrRunPendingTask(base::PendingTask) + 28 0001:059B2880 (base_jumbo_9.o)
 0440F8E4  B8412B34   QT5WEBCD  0001:059B2B34   base::MessageLoopImpl::DoWork() + 104 0001:059B2A30 (base_jumbo_9.o)
 0440F9B4  B8451708   QT5WEBCD  0001:059F1708   base::MessagePumpDefault::Run(base::MessagePump::Delegate*) + 28 0001:059F16E0 (base_jumbo_10.o)
 0440F9F4  B8407BC0   QT5WEBCD  0001:059A7BC0   base::MessageLoopImpl::Run(bool) + 50 0001:059A7B70 (base_jumbo_9.o)
 0440FB04  B85CF498   QT5WEBCD  0001:05B6F498   base::RunLoop::Run() + 128 0001:05B6F370 (base_jumbo_16.o)
 0440FC74  B84DCE35   QT5WEBCD  0001:05A7CE35   base::Thread::Run(base::RunLoop*) + A5 0001:05A7CD90 (base_jumbo_26.o)
 0440FDA4  B84E6CD3   QT5WEBCD  0001:05A86CD3   base::Thread::ThreadMain() + 363 0001:05A86970 (base_jumbo_26.o)

It's not from the disk_cache submodule but also seems to be related to caching.

dmik commented 4 years ago

This whole mess was caused by a simple fact that base::File did not use O_BINARY on OS/2. As a result, \n was translated to \r\n back and forth on input/output which completely broke things because the code expects binary mode with no manipulation. No any fatal errors from disk_cache since the fix which I will commit now.

dmik commented 4 years ago

Just as a side note (for future reference). There are two disk cache modes in Chromium: normal (fewer big files) and simple (lots of small files). The simple is used on Android and Fuchsia as, although it's easier to maintain, it puts high pressure on the disk I/O subsystem on Desktop OSes so not used there by default (OS/2 included). While experimenting, I tried switching it to simple on OS/2 but the performance was poor and it didn't work well so I switched it back to normal.

Also, on Posix, there are two ways to work with the normal disk cache: via mmap and avoiding mmap (mapped_file_posix.cc and mapped_file_avoid_mmap_posix.c, respectively). The first one uses a shared mapping, the second one uses a private memory buffer. OS/2 uses the first one and it seems to work fine, so I leave it in. The one with the private memory buffer also works fine on OS/2 but the cache is obviously shared between tabs so having a private memory buffer for each tab (process?) will significantly increase memory usage compared to mmap where all processes will use the same shared memory block (due to the way how shared mmap is implemented on OS/2).