StanfordLegion / legion

The Legion Parallel Programming System
https://legion.stanford.edu
Apache License 2.0
678 stars 145 forks source link

Realm: Unable to build with UCX #1396

Open syamajala opened 1 year ago

syamajala commented 1 year ago

I'm trying to test UCX on Summit. When building with CMake using -DLegion_NETWORKS=ucx it seems to be finding ucx as I see this:

-- Found UCX: /usr/include  
--   UCX libraries: /usr/lib64/libucp.so;/usr/lib64/libuct.so;/usr/lib64/libucm.so;/usr/lib64/libucs.so)

But then I'm getting a lot of build errors like this:

[32%] Building CXX object runtime/CMakeFiles/RealmRuntime.dir/realm/ucx/ucp_internal.cc.o                             
In file included from /gpfs/alpine/cmb103/scratch/seshuy/legion_s3d_nscbc_summit_ucx/legion/runtime/realm/ucx/ucp_inter
nal.h:34,                                                                                                              
                 from /gpfs/alpine/cmb103/scratch/seshuy/legion_s3d_nscbc_summit_ucx/legion/runtime/realm/ucx/ucp_modul
e.cc:25:                                                                                                               /gpfs/alpine/cmb103/scratch/seshuy/legion_s3d_nscbc_summit_ucx/legion/runtime/realm/ucx/ucp_context.h:50:7: error: _ucs
_memory_type_t_ does not name a type                                                                                   
   50 |       ucs_memory_type_t         memtype;                                                                             |       ^~~~~~~~~~~~~~~~~                                                                                        
/gpfs/alpine/cmb103/scratch/seshuy/legion_s3d_nscbc_summit_ucx/legion/runtime/realm/ucx/ucp_context.h:51:7: error: _ucp
_send_nbx_callback_t_ does not name a type; did you mean _ucp_send_callback_t_?                                        
   51 |       ucp_send_nbx_callback_t   cb;                                                                            
      |       ^~~~~~~~~~~~~~~~~~~~~~~                                                                                  
      |       ucp_send_callback_t                                                                                      
/gpfs/alpine/cmb103/scratch/seshuy/legion_s3d_nscbc_summit_ucx/legion/runtime/realm/ucx/ucp_context.h:106:9: error: _uc
s_memory_type_t_ has not been declared                                                                                 
  106 |         ucs_memory_type_t memtype);                                                                            
      |         ^~~~~~~~~~~~~~~~~                                                                                      
In file included from /gpfs/alpine/cmb103/scratch/seshuy/legion_s3d_nscbc_summit_ucx/legion/runtime/realm/ucx/ucp_modul
e.cc:25:                                                                                                               
/gpfs/alpine/cmb103/scratch/seshuy/legion_s3d_nscbc_summit_ucx/legion/runtime/realm/ucx/ucp_internal.h:289:9: error: _u
cp_am_recv_callback_t_ has not been declared                                                                           
  289 |         ucp_am_recv_callback_t cb, AmHandlersArgs *args);                                                            |         ^~~~~~~~~~~~~~~~~~~~~~                                                                                 /gpfs/alpine/cmb103/scratch/seshuy/legion_s3d_nscbc_summit_ucx/legion/runtime/realm/ucx/ucp_internal.h:298:15: error: _ucp_am_recv_param_t_ does not name a type                                                                                298 |         const ucp_am_recv_param_t *param);                                                                     
      |               ^~~~~~~~~~~~~~~~~~~                                                                              
/gpfs/alpine/cmb103/scratch/seshuy/legion_s3d_nscbc_summit_ucx/legion/runtime/realm/ucx/ucp_internal.h:309:15: error: _
ucp_am_recv_param_t_ does not name a type                                                                              
  309 |         const ucp_am_recv_param_t *param);
      |               ^~~~~~~~~~~~~~~~~~~
/gpfs/alpine/cmb103/scratch/seshuy/legion_s3d_nscbc_summit_ucx/legion/runtime/realm/ucx/ucp_internal.h:313:15: error: _
ucp_am_recv_param_t_ does not name a type
  313 |         const ucp_am_recv_param_t *param);
      |               ^~~~~~~~~~~~~~~~~~~
/gpfs/alpine/cmb103/scratch/seshuy/legion_s3d_nscbc_summit_ucx/legion/runtime/realm/ucx/ucp_internal.h:421:5: error: _u
cs_memory_type_t_ does not name a type
  421 |     ucs_memory_type_t memtype;
      |     ^~~~~~~~~~~~~~~~~

As an aside I see there are UCX modules on Crusher as well. I tried to build on there and saw similar errors.

streichler commented 1 year ago

Can you tell what version of UCX is installed on Summit?

streichler commented 1 year ago

@seyedmir ^

SeyedMir commented 1 year ago

Can you please share the output from "ucx_info -v"? For the UCX backend, you need UCX 1.14.0 which is the very latest version. The release candidate can be found here: https://github.com/openucx/ucx/releases/tag/v1.14.0-rc2

streichler commented 1 year ago

@SeyedMir do UCX headers have a preprocessor defines for the version that we could use to guard an #error you need at least UCX 1.14.0 message?

SeyedMir commented 1 year ago

They do. There is ucp_version.h which does that.

syamajala commented 1 year ago

Summit is 1.8.0 and Crusher is 1.9.0. I will try building 1.14.0-rc2.

SeyedMir commented 1 year ago

@syamajala Please make sure you add --enable-mt during configuration because the UCX backend requires multi-threaded support in UCX.