-
I think this can be done with [CRIU](https://criu.org/).
I imagine it to behave like the `python` binary, except that it has TF (`tensorflow`) and maybe other modules already preloaded. TF probably o…
-
Good night, I am new using CRIU, I was reading about how it is executed, and I get errors, I do not know why they appear, I would really appreciate your help. The errors are the following:
[localho…
-
With `gcc-9`, dmtcp_launch throws the following warning:
```
$ ../bin/dmtcp_launch ./dmtcp1
[40000] WARNING at tls.cpp:256 in TLSInfo_GetTidOffset; REASON='JWARNING(false) failed'
tid_pid.tid…
-
Hi,
I want to checkpoint and restart on multiple nodes.
my checkpoint command is: /home/test/DMTCP_NEW/dmtcp/bin/dmtcp_launch --interval 1 --coord-host 192.168.1.14 --coord-port 7790 -j /home/test/D…
-
I configured and installed mana from the dmtcp-master branch, on a local centos 7 machine. When trying to execute the mpi_hello_world.exe example, i get the following error:
`
ERROR: ld.so: object…
-
Hi, I try to dmtcp_launch a hip program with dmtcp2.6, but the program gets stuck somewhere.
When I don't use dmtcp_launch, it can run correctly.
-
When attempting to checkpoint with densely grouped collective calls, the checkpointing process does not complete. Instead, the ranks are unable to progress beyond the PRESUSPEND barrier.
```
Coor…
ghost updated
2 years ago
-
Hi,
When cloning the feature/dmtcp-master branch on a Centos 7 machine, configuring mana gives the following warning:
`configure: WARNING: no configuration information is in dmtcp`
And when try…
-
Hi,
Could you tell me the version and branch of DMTCP which CRAC master is basing on?
Thank you very much.
GoodKairos
-
Quoting @marcpb94 from #133:
> It seems to be working now! I tried with mpi_hello_world with 1 rank and a heat distribution application (that we usually use for testing) with 4 ranks.
> However,…