Alpine-DAV / ascent

A flyweight in situ visualization and analysis runtime for multi-physics HPC simulations
https://alpine-dav.github.io/ascent/
Other
189 stars 65 forks source link

Segfault of action.print() #1332

Open yslan opened 1 month ago

yslan commented 1 month ago

I'm trying to print the mesh_data with print() from the first rank and I get the SIGSEGV.

code

  conduit::Node mesh_data;

  // some setup...

  if (platform->comm.mpiRank == 0) {
    mesh_data.print();
  }
  fflush(stdout);

error

Loguru caught a signal: SIGSEGV
Stack trace:
16            0x4075da _start + 44
15      0x15232f52024d __libc_start_main + 239
14            0x4061bd main + 4365
13      0x15233251bbf8 nekrs::setup(int, int, int, int, int, std::map<std::string, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string > > >, std::less<std::string>, std::allocator<std::pair<std::string const, std::map<std::string, std::string, std::less<std::__cxx11::basic_str12      0x152332589631 nrs_t::init() + 8993
11      0x152332586427 nrs_t::setIC() + 1575
10      0x15232aef800a UDF_Setup + 3658
9       0x15232aef4592 nekAscent::setup(mesh_t*, std::string const&, int, bool) + 5714
8       0x152212871bb5 conduit::Node::print() const + 21
7       0x152212871b87 conduit::Node::to_summary_string_stream(std::ostream&) const + 39
6       0x1522128712d5 conduit::Node::to_summary_string_stream(std::ostream&, conduit::Node const&) const + 613
5       0x152212852e4a conduit::Node::to_summary_string_stream(std::ostream&, long, long, long, long, std::string const&, std::string const&) const + 458
4       0x152212852e4a conduit::Node::to_summary_string_stream(std::ostream&, long, long, long, long, std::string const&, std::string const&) const + 458
3       0x152212852e4a conduit::Node::to_summary_string_stream(std::ostream&, long, long, long, long, std::string const&, std::string const&) const + 458
2       0x152212852e4a conduit::Node::to_summary_string_stream(std::ostream&, long, long, long, long, std::string const&, std::string const&) const + 458
1       0x152212853225 conduit::Node::to_summary_string_stream(std::ostream&, long, long, long, long, std::string const&, std::string const&) const + 1445
0       0x1522128146b2 conduit::DataArray<double>::to_summary_string_stream(std::ostream&, long) const + 290
2024-07-12 19:42:57.034 (  44.142s) [main thread     ]                       :0     FATL| Signal: SIGSEGV
./.lhelper: line 4: 2024396 Segmentation fault      (core dumped) $*
x3006c0s19b0n0.hsn.cm.polaris.alcf.anl.gov: rank 0 exited with code 139

version

visualization/ascent/develop/2024-05-03-8baa78c
cyrush commented 1 month ago

@yslan

Unless you are using unified memory, if the node has pointers that can only be accessed on the GPU -- printing will segfault.

That said, there may be a trick you can try:

Node mesh_copy;
mesh_copy.set(mesh_data);
mesh_copy.print();

Depending on how things are built, "set" can be gpu aware and copy data back to the CPU, so you can print.

yslan commented 1 month ago

@cyrush

That works, but I don't want to copy data back to CPU. Is there a way to print something like (values are not on the host) to avoid SEGV?

cyrush commented 1 month ago

Conduit isn't compiled with any knowledge of host / device, so it's print functions can't be that smart right now. We are working on adding some optional GPU support, we can evaluate what is possible when we have GPU infrastructure in Conduit.