HDFGroup / hcl

Hermes Container Library
Other
5 stars 1 forks source link

A minor issue when running on cluster with SLURM submit job #11

Closed ctminh closed 3 years ago

ctminh commented 3 years ago

Hi *, I tried to run an example of queue_test on a Linux cluster with SLURM-job submission. The local_queue access (i.e., push, get) is ok, but the global_queue put/get (hcl queue) is suspended. Do you know what is the problem? Or did I miss something about the configuration? I run the test in .../hcl/blob/dev/test/queue_test.cpp The job was submitted on 4 nodes, 1 rank/node.

ChristopherHogan commented 3 years ago

Hi. There is some documentation for running on a cluster on our wiki. The important part is to correctly populate the hostfile in the test directory.

ChristopherHogan commented 3 years ago

Also, you need the hostnames or addresses of the server processes in the server_list file. More details can be found under SERVER_LIST_PATH in the Structure Initialization section of the README.

ctminh commented 3 years ago

Thanks Christopher! It's true that the problem in the server_list file, just fixed it and it's fine now.