Hello everyone,
I managed to run firmament using the provided docker image.
When I run the container, it gives me the following error (don't know if it is related to my issue):
$ docker run -p 9999:9999 -w /firmament camsas/firmament:dev /firmament/build/src/coordinator --scheduler flow --flow_scheduling_cost_model 6 --listen_uri tcp:0.0.0.0:8081 --http_ui_port 9999 --task_lib_dir=/firmament/build/src
Unexpected end of /proc/mounts line `overlay / overlay rw,relatime,lowerdir=/var/lib/docker/overlay2/l/H2GR6RBYUIPHBXDMSGKPBAYWNE:/var/lib/docker/overlay2/l/NKEZN6MLXD4DGK5HNNI2K4SN7K:/var/lib/docker/overlay2/l/5H5GK4TBC5MY7NFNYEW2P7MPRP:/var/lib/docker/overlay2/l/2DVGBZKGQHVXVENMWHAW3HNEGB:/var/lib/docker/overlay2/l/DA5VWJ6IOM3MFNW3T6VLSZ4ZDR:/var/lib/docker/overlay2/l/NFSSHKRHC7XPWN7BXCLFDMXHF6:/var/lib/docker/overlay2/l/C4RYQ3MDIDZ376KHATSEPRHOOC:/var/lib/docker/overlay2/l/23CTT2D5BDVQOVVUTHAGP4SPKX:/var/lib/docker/overlay2/l/UTO3PZRTFU4CU'
Despite this, the server seems running correctly, and I am able to access the gui at http://:9999/
However, when I tried to submit a job with
python scripts/job/job_submit.py 172.17.0.2 9999 /bin/sleep 60
I got the following error:
E1116 17:19:04.534961 6 task_health_checker.cc:51] Task 18085502784089753274 has failed!
E1116 17:18:57.757828 21 local_executor.cc:443] execvp failed for task command 'perf stat -o /tmp/firmament-perf/aa1d8806-8de1-4c73-b634-214341eed606-18085502784089753274.perf -e cpu-clock,task-clock,context-switches,cpu-migrations,page-faults,cycles,instructions,branches,branch-misses,cache-misses,cache-references,stalled-cycles-frontend,stalled-cycles-backend,node-loads,node-load-misses -- /bin/sleep 60 ': No such file or directory [2]
I fixed it by adding aa1d8806-8de1-4c73-b634-214341eed606-18085502784089753274.perf file to the content /tmp/firmament-perf/ in the docker container. It seems like working well.
Hello everyone, I managed to run firmament using the provided docker image. When I run the container, it gives me the following error (don't know if it is related to my issue):
Despite this, the server seems running correctly, and I am able to access the gui at http://:9999/
However, when I tried to submit a job with
python scripts/job/job_submit.py 172.17.0.2 9999 /bin/sleep 60
I got the following error:E1116 17:19:04.534961 6 task_health_checker.cc:51] Task 18085502784089753274 has failed!
Here is /tmp/coordinator.INFO:
And here is what I get from the GUI:
By clicking both on the stderr link, I get:
What am I missing?
Thanks! Gabriele