Closed Lubrsi closed 4 years ago
cc @tomuta
Not sure what's going on there, I tried to reproduce it but haven't had luck. Here's what I suspect is wrong: In MemoryManager::ensure_pte
and MemoryManager::release_pte
I think we should use m_kernel_page_directory.m_page_tables
rather than page_directory.page_tables
if page_directory_table_index == 3
...
I think I found a reproducible way. Basically, you need to get the memory usage very close to maximum but not enough to start purging. Then from that point, you need to slowly creep the memory usage up to maximum, where it starts purging and then it asserts.
Now that I have a somewhat consistent way of reproducing it, I'll see if that PR fixes it.
Here's my arbitrary way:
Output:
[Browser(25:25)]: MM: Purge saved the day! Purged 16 pages from PurgeableVMObject{0xc044a394}
[Browser(25:25)]: MM: Purge saved the day! Purged 16 pages from PurgeableVMObject{0xc0462cf4}
[Browser(25:25)]: MM: Purge saved the day! Purged 16 pages from PurgeableVMObject{0xc0462db4}
[Browser(25:25)]: MM: Purge saved the day! Purged 16 pages from PurgeableVMObject{0xc048d314}
[Browser(25:25)]: MM: Purge saved the day! Purged 16 pages from PurgeableVMObject{0xc04789d4}
[Browser(25:25)]: MM: Purge saved the day! Purged 16 pages from PurgeableVMObject{0xc046bcd4}
[Browser(25:25)]: MM: Purge saved the day! Purged 9 pages from PurgeableVMObject{0xc0484194}
[Browser(25:25)]: MM: Purge saved the day! Purged 16 pages from PurgeableVMObject{0xc047fab4}
[Browser(25:25)]: MM: Purge saved the day! Purged 16 pages from PurgeableVMObject{0xc04c70d4}
[Browser(25:25)]: MM: Purge saved the day! Purged 9 pages from PurgeableVMObject{0xc04c71d4}
[Browser(25:25)]: MM: Purge saved the day! Purged 16 pages from PurgeableVMObject{0xc047eef4}
[Browser(25:25)]: MM: Purge saved the day! Purged 16 pages from PurgeableVMObject{0xc04d6d34}
[Browser(25:25)]: MM: Purge saved the day! Purged 16 pages from PurgeableVMObject{0xc04db1f4}
[Browser(25:25)]: MM: Purge saved the day! Purged 28 pages from PurgeableVMObject{0xc04db9d4}
[Browser(25:25)]: MM: Purge saved the day! Purged 9 pages from PurgeableVMObject{0xc04d9c54}
[Browser(25:25)]: MM: Purge saved the day! Purged 9 pages from PurgeableVMObject{0xc04df754}
[Browser(25:25)]: MM: Purge saved the day! Purged 721 pages from PurgeableVMObject{0xc04dfbb4}
[Browser(25:25)]: ASSERTION FAILED: result == AK::HashSetResult::InsertedNewEntry
../Kernel/VM/MemoryManager.cpp:238 in Kernel::PageTableEntry& Kernel::MemoryManager::ensure_pte(Kernel::PageDirectory&,
VirtualAddress)
[Browser(25:25)]: 0xc011e9a5 __assertion_failed(char const*, char const*, unsigned int, char const*) +185
[Browser(25:25)]: 0xc018a2d5 Kernel::MemoryManager::ensure_pte(Kernel::PageDirectory&, VirtualAddress) +111
[Browser(25:25)]: 0xc018d948 Kernel::Region::map_individual_page_impl(unsigned long) +86
[Browser(25:25)]: 0xc018dde8 Kernel::Region::map(Kernel::PageDirectory&) +76
[Browser(25:25)]: 0xc01617c7 Kernel::Process::allocate_region_with_vmobject(Kernel::Range const&, AK::NonnullRefPtr<Ker
nel::VMObject>, unsigned long, AK::String const&, int) +407
[Browser(25:25)]: 0xc016fc0f Kernel::Process::sys$mmap(AK::Userspace<Kernel::Syscall::SC_mmap_params const*>) +907
[Browser(25:25)]: 0xc01679ff syscall_handler +1339
[Browser(25:25)]: 0xc0167435 syscall_asm_entry +49
[Browser(25:25)]: 0x081b67b3 malloc +1207
[Browser(25:25)]: 0x0805c467 Web::DOM::Element::create_layout_node(Web::CSS::StyleProperties const*) +533
[Browser(25:25)]: 0x080ab42b Web::create_layout_tree(Web::DOM::Node&, Web::CSS::StyleProperties const*) +28
[Browser(25:25)]: 0x080ab4d1 Web::create_layout_tree(Web::DOM::Node&, Web::CSS::StyleProperties const*) +194
[Browser(25:25)]: 0x080ab4d1 Web::create_layout_tree(Web::DOM::Node&, Web::CSS::StyleProperties const*) +194
[Browser(25:25)]: 0x080ab4d1 Web::create_layout_tree(Web::DOM::Node&, Web::CSS::StyleProperties const*) +194
[Browser(25:25)]: 0x080ab4d1 Web::create_layout_tree(Web::DOM::Node&, Web::CSS::StyleProperties const*) +194
[Browser(25:25)]: 0x080ab4d1 Web::create_layout_tree(Web::DOM::Node&, Web::CSS::StyleProperties const*) +194
[Browser(25:25)]: 0x080ab4d1 Web::create_layout_tree(Web::DOM::Node&, Web::CSS::StyleProperties const*) +194
[Browser(25:25)]: 0x080ab886 Web::LayoutTreeBuilder::build(Web::DOM::Node&) +302
[Browser(25:25)]: 0x0805a8f5 Web::DOM::Document::layout() +67
[Browser(25:25)]: 0x080b43ac Web::Frame::set_document(Web::DOM::Document*) +112
[Browser(25:25)]: 0x080ae3e6 Web::FrameLoader::resource_did_load() +378
[Browser(25:25)]: 0x080afddb Web::Resource::for_each_client(AK::Function<void (Web::ResourceClient&)>) +413
[Browser(25:25)]: 0x080b02fe Web::Resource::did_load(AK::Badge<Web::ResourceLoader>, AK::ByteBuffer const&, AK::HashMap
<AK::String, AK::String, AK::CaseInsensitiveStringTraits> const&) +1162
[Browser(25:25)]: 0x080b0bbf AK::Function<void (AK::ByteBuffer const&, AK::HashMap<AK::String, AK::String, AK::CaseInse
nsitiveStringTraits> const&)>::CallableWrapper<Web::ResourceLoader::load_resource(Web::Resource::Type, Web::LoadRequest
const&)::{lambda(auto:1&, auto:2&)#1}>::call(AK::ByteBuffer const&, AK::HashMap<AK::String, AK::String, AK::CaseInsensit
iveStringTraits> const&) const +45
[Browser(25:25)]: 0x080b10a1 AK::Function<void (bool, AK::ByteBuffer const&, AK::RefPtr<AK::SharedBuffer>, AK::HashMap<
AK::String, AK::String, AK::CaseInsensitiveStringTraits> const&, AK::Optional<unsigned int>)>::CallableWrapper<Web::Reso
urceLoader::load(AK::URL const&, AK::Function<void (AK::ByteBuffer const&, AK::HashMap<AK::String, AK::String, AK::CaseI
nsensitiveStringTraits> const&)>, AK::Function<void (AK::String const&)>)::{lambda(bool, AK::ByteBuffer const&, auto:1,
auto:2&, auto:3)#4}>::call(bool, AK::ByteBuffer const&, AK::RefPtr<AK::SharedBuffer>, AK::HashMap<AK::String, AK::String
, AK::CaseInsensitiveStringTraits> const&, AK::Optional<unsigned int>) const +279
[Browser(25:25)]: 0x080ec0fd Protocol::Download::did_finish(AK::Badge<Protocol::Client>, bool, AK::Optional<unsigned in
t>, unsigned int, int, IPC::Dictionary const&) +749
[Browser(25:25)]: 0x080e8d4e Protocol::Client::handle(Messages::ProtocolClient::DownloadFinished const&) +134
[Browser(25:25)]: 0x080e9873 ProtocolClientEndpoint::handle(IPC::Message const&) +53
[Browser(25:25)]: 0x080eacdc IPC::ServerConnection<ProtocolClientEndpoint, ProtocolServerEndpoint>::handle_messages() +
106
[Browser(25:25)]: 0x081ac8ef Core::EventLoop::pump(Core::EventLoop::WaitMode) +265
[Browser(25:25)]: 0x081aca99 Core::EventLoop::exec() +43
[Browser(25:25)]: 0x08048937 main +1335
[Browser(25:25)]: 0x0804e2be _start +94
Not sure what's going on there, I tried to reproduce it but haven't had luck. Here's what I suspect is wrong: In
MemoryManager::ensure_pte
andMemoryManager::release_pte
I think we should usem_kernel_page_directory.m_page_tables
rather thanpage_directory.page_tables
ifpage_directory_table_index == 3
...
That is sane, but it sounds like something we should assert, not necessarily handle at runtime! Or.. how do we end up trying to put a kernel page into a userspace process page directory?
Not sure what's going on there, I tried to reproduce it but haven't had luck. Here's what I suspect is wrong: In
MemoryManager::ensure_pte
andMemoryManager::release_pte
I think we should usem_kernel_page_directory.m_page_tables
rather thanpage_directory.page_tables
ifpage_directory_table_index == 3
...That is sane, but it sounds like something we should assert, not necessarily handle at runtime! Or.. how do we end up trying to put a kernel page into a userspace process page directory?
I think we have to handle that at runtime. Unless you somehow turn PageDirectory::m_directory_pages[3]
into a reference that points to a global one for the kernel address space >= 0xc0000000
.
What I mean is I think all the code paths that would add mappings >3G are already explicitly adding them to kernel_page_directory
So I think my change in #3371 does address a legitimate page table loss issue for kernel mappings. But I think there's a separate issue related to purging pages in low-memory conditions, which I think I already reported in #2741. It seems like some pages get purged that shouldn't be (maybe it doesn't respect madvise
protection properly?)
What I mean is I think all the code paths that would add mappings >3G are already explicitly adding them to kernel_page_directory
Oh I see what you mean. And I think you're right, added this assertion to ensure_ptr and it doesn't get triggered:
ASSERT(page_directory_table_index != 3 || &page_directory == m_kernel_page_directory.ptr());
Ok, I found an even easier way to reproduce this. Run "allocate 209 MiB" and then open the browser.
Doing this also reveals so many more assertions in memory manager.
(This is on latest master with the taskbar fix)
So with the allocate 209
command it also seems fork
related:
[SystemServer(5:5)]: ASSERTION FAILED: result == AK::HashSetResult::InsertedNewEntry
../Kernel/VM/MemoryManager.cpp:243 in Kernel::PageTableEntry& Kernel::MemoryManager::ensure_pte(Kernel::PageDirectory&, VirtualAddress)
[SystemServer(5:5)]: 0xc011e9a5 __assertion_failed(char const*, char const*, unsigned int, char const*) +185
[SystemServer(5:5)]: 0xc018ae3c Kernel::MemoryManager::ensure_pte(Kernel::PageDirectory&, VirtualAddress) +912
[SystemServer(5:5)]: 0xc018dfbc Kernel::Region::map_individual_page_impl(unsigned long) +86
[SystemServer(5:5)]: 0xc018e45c Kernel::Region::map(Kernel::PageDirectory&) +76
[SystemServer(5:5)]: 0xc016d1c4 Kernel::Process::sys$fork(Kernel::RegisterState&) +1872
[SystemServer(5:5)]: 0xc01678a8 syscall_handler +946
[SystemServer(5:5)]: 0xc0167467 syscall_asm_entry +49
[SystemServer(5:5)]: MM: PURGING ->
I also get it through execve:
[SystemServer(25:25)]: ASSERTION FAILED: result == AK::HashSetResult::InsertedNewEntry
../Kernel/VM/MemoryManager.cpp:238 in Kernel::PageTableEntry& Kernel::MemoryManager::ensure_pte(Kernel::PageDirectory&, VirtualAddress)
[SystemServer(25:25)]: 0xc011e9b7 __assertion_failed(char const*, char const*, unsigned int, char const*) +185
[SystemServer(25:25)]: 0xc018a315 Kernel::MemoryManager::ensure_pte(Kernel::PageDirectory&, VirtualAddress) +111
[SystemServer(25:25)]: 0xc018d98e Kernel::Region::map_individual_page_impl(unsigned long) +86
[SystemServer(25:25)]: 0xc018de2e Kernel::Region::map(Kernel::PageDirectory&) +76
[SystemServer(25:25)]: 0xc0188df0 Kernel::MemoryManager::allocate_kernel_region_with_vmobject(Kernel::Range const&, Kernel::VMObject&, AK::StringView const&, unsigned char, bool, bool) +388
[SystemServer(25:25)]: 0xc0188ec2 Kernel::MemoryManager::allocate_kernel_region_with_vmobject(Kernel::VMObject&, unsigned long, AK::StringView const&, unsigned char, bool, bool) +182
[SystemServer(25:25)]: 0xc016adf6 Kernel::Process::do_exec(AK::NonnullRefPtr<Kernel::FileDescription>, AK::Vector<AK::String, 0ul>, AK::Vector<AK::String, 0ul>, AK::RefPtr<Kernel::FileDescription>, Kernel::Thread*&, unsigned int&) +984
[SystemServer(25:25)]: 0xc016c226 Kernel::Process::exec(AK::String, AK::Vector<AK::String, 0ul>, AK::Vector<AK::String, 0ul>, int) +1792
[SystemServer(25:25)]: 0xc016c71d Kernel::Process::sys$execve(AK::Userspace<Kernel::Syscall::SC_execve_params const*>) +637
[SystemServer(25:25)]: 0xc0167a3f syscall_handler +1339
[SystemServer(25:25)]: 0xc0167475 syscall_asm_entry +49
I wonder if PageDirectory::PageDirectory
fails allocating some physical pages. Looks like we wouldn't fail fork/execve in that case.
I wonder if
PageDirectory::PageDirectory
fails allocating some physical pages. Looks like we wouldn't fail fork/execve in that case.
Nope, not the case...
Ok I think I know what's going on here. The problem is in MemoryManager::ensure_pte
when it calls allocate_user_physical_page
. In the case that allocate_user_physical_page
needs to purge memory first, the act of purging triggers calls to quickmap_pd
within allocate_user_physical_page
, rendering the first call to quickmap_pd
in MemoryManager::ensure_pte
stale, which means pd
and pde
are no longer pointing at the same underlying page. Which means that we insert the page into page_directory.m_page_tables
thinking pde
is now set, but because we wrote it to the wrong memory location, the next subsequent call to MemoryManager::ensure_pte
will again correctly remap pd
by calling quickmap_pd
to the correct physical page, and this time pde
is most likely still unset again, so we do the same again and because we insert the same entry to page_directory.m_page_tables
again the ASSERT
rightfully triggers.
Great analysis, I think you're right :)
Two thoughts on fixing this:
Well the easy fix (which I have already coded up) is adding an optional argument to allocate_user_physical_page
that returns a boolean if any pages were purged, and if so we just need to call quickmap_pd
again.
Right now MM relies on a global lock, so even with multiple processors only one would be able to call ensure_pte
(or most other things) at a time. We could easily do something similar like quickmap_page
/unquickmap_page
and just have a bool
that we flip.
Unfortunately fixing this triggers other problems down the road ;-)
* Maybe we could have multiple quickmap slots. It could be a basic circular queue. Seems useful for multi-processing as well.
Btw, we have this for regular quickmap
already, every processor has its own slot.
Not quite sure what caused it, I ran into it while profiling chess engine (it doesn't happen all the time):