StanfordLegion / legion

The Legion Parallel Programming System
https://legion.stanford.edu
Apache License 2.0
657 stars 146 forks source link

Invalid allocation payload buffer size passed to network module #1698

Open SeyedMir opened 3 weeks ago

SeyedMir commented 3 weeks ago

Whenever Realm asks the network module to provide a payload buffer, the corresponding size must be lower than a threshold that the network module recommends (through recommended_max_payload API). However, this requirement is ignored in https://github.com/StanfordLegion/legion/blob/c61071541218747e35767317f6f89b83f374f264/runtime/realm/transfer/ib_memory.cc#L678 which ultimately leads to an assertion failure in the UCX backend. Here is the backtrace:

#9  0x00007ffff1cd3e96 in __assert_fail () from /usr/lib/x86_64-linux-gnu/libc.so.6
#10 0x00007ffff2864c92 in Realm::UCP::UCPInternal::pbuf_get (this=0x5555555dadc0, worker=0x555558bebd00, size=8240)
    at /opt/legate/legate.core/arch-legate/cmake_build/_deps/legion-src/runtime/realm/ucx/ucp_internal.cc:1827
#11 0x00007ffff28652ef in Realm::UCP::UCPMessageImpl::UCPMessageImpl (this=0x7f512a907080, _internal=0x5555555dadc0, _target=6, _msgid=427, _header_size=32, _max_payload_size=8240,
    _src_payload_addr=0x0, _src_payload_lines=0, _src_payload_line_stride=0, _src_segment=0x0, _dest_payload_addr=0x0, _storage_size=256)
    at /opt/legate/legate.core/arch-legate/cmake_build/_deps/legion-src/runtime/realm/ucx/ucp_internal.cc:1927
#12 0x00007ffff2859f09 in Realm::UCPModule::create_active_message_impl (this=0x5555555dac80, target=6, msgid=427, header_size=32, max_payload_size=8240, src_payload_addr=0x0,
    src_payload_lines=0, src_payload_line_stride=0, storage_base=0x7f512a907080, storage_size=256)
    at /opt/legate/legate.core/arch-legate/cmake_build/_deps/legion-src/runtime/realm/ucx/ucp_module.cc:259
#13 0x00007ffff228be79 in Realm::Network::create_active_message_impl (target=6, msgid=427, header_size=32, max_payload_size=8240, src_payload_addr=0x0, src_payload_lines=0,
    src_payload_line_stride=0, storage_base=0x7f512a907080, storage_size=256) at /opt/conda/envs/legate/include/realm/network.inl:100
#14 0x00007ffff23624e8 in Realm::ActiveMessage<Realm::RemoteIBAllocRequestMultiple, 256ul>::init (this=0x7f512a907060, _target=6, _max_payload_size=8240)
    at /opt/conda/envs/legate/include/realm/activemsg.inl:53
#15 0x00007ffff235896e in Realm::ActiveMessage<Realm::RemoteIBAllocRequestMultiple, 256ul>::ActiveMessage (this=0x7f512a907060, _target=6, _max_payload_size=8240)
    at /opt/conda/envs/legate/include/realm/activemsg.inl:44
#16 0x00007ffff23f1990 in Realm::RemoteIBAllocRequestMultiple::handle_message (sender=4, args=..., data=0x7f53ce634768, msglen=8112)
    at /opt/legate/legate.core/arch-legate/cmake_build/_deps/legion-src/runtime/realm/transfer/ib_memory.cc:679
#17 0x00007ffff23f210e in Realm::HandlerWrappers::wrap_handler_notimeout<Realm::RemoteIBAllocRequestMultiple, Realm::RemoteIBAllocRequestMultiple::handle_message> (sender=4,
    header=0x5555574ed730, payload=0x7f53ce634768, payload_size=8112) at /opt/conda/envs/legate/include/realm/activemsg.inl:625
#18 0x00007ffff2886431 in Realm::IncomingMessageManager::do_work (this=0x5555559448d0, work_until=...)
    at /opt/legate/legate.core/arch-legate/cmake_build/_deps/legion-src/runtime/realm/activemsg.cc:770

Until frame 16, the msglen is valid (8112). But, in frame 15 it is increased to 8240 which will be invalid (max valid is 8192).

SeyedMir commented 3 weeks ago

FYI @streichler @eddy16112

muraj commented 3 weeks ago

It's not clear to me based on the ActiveMessage class who's responsibility it is to fragment a large payload. I've hit this issue with UCX a bunch of different ways in the past and just kind of tried to make sure I didn't send large payloads, but that's not a really scalable solution.

elliottslaughter commented 3 weeks ago

I think this is a variation of #1229 (although the description here is a lot more clear than the one in that issue). Anyway, it's a known issue for a long time and @streichler suggested some potential directions in the other thread.

eddy16112 commented 3 weeks ago

If we need to send a large message. what should we do? manually split the message? I think we have other active messages that are not protected by recommended_max_payload.

elliottslaughter commented 3 weeks ago

Quoting from https://github.com/StanfordLegion/legion/issues/1229#issuecomment-1086280169:

... I need to either:

  1. break microop requests into potentially multiple packets (gross), or

  2. break a request like this into multiple microops (good for parallelism, but duplicates a bit of effort like reading the pointer fields)

One concern with approach (2) is that on some systems, the max medium message is a lot smaller than 64K (e.g. 4K) and that might be a LOT more microops.

From https://github.com/StanfordLegion/legion/issues/1229#issuecomment-1500721477:

... I'm pretty sure (2) is the right answer, and we can look at the performance with more-but-smaller microops to confirm it's not a big deal.

Seems like it addresses this use case?

SeyedMir commented 3 weeks ago

If we need to send a large message. what should we do? manually split the message? I think we have other active messages that are not protected by recommended_max_payload.

If you're asking the network module to provide a payload buffer for that large message, then yes, it needs to be split into smaller chunks. The UCX backend cannot provide arbitrarily large payload buffers.

apryakhin commented 3 weeks ago

It's probably depends on the use case but I think we should use recommended_max_payload and try to fragment and pipeline things in Realm if we can as for example in the context of dependent partitioning image/preimage operations. On the other hand, just thinking out loud - if we need to deliver a very large payload (exceeds max recommended) to multiple peers we can go down the path of collectives and rely let's say on UCC to do the right thing (for instance scatter + allgather).