AIFM-sys / AIFM

AIFM: High-Performance, Application-Integrated Far Memory
MIT License
104 stars 34 forks source link

spend too much time to run an experiment #10

Closed ckf104 closed 2 years ago

ckf104 commented 2 years ago

Hello, I am now trying to reproduce experiments result in CloudLab. But the program in fig6bc has run over five hours. Is it usual case? Here is some information.

$ps -eo etime,cmd | grep main
   05:00:28 sudo stdbuf -o0 sh -c ./main /users/chenkefa/AIFM/aifm/configs/client.config                            18.18.1.3:8000
   05:00:28 sh -c ./main /users/chenkefa/AIFM/aifm/configs/client.config                            18.18.1.3:8000
   05:00:28 ./main /users/chenkefa/AIFM/aifm/configs/client.config 18.18.1.3:8000
   00:00 grep main
$ls -l  # in the dir fig6bc
total 17324 
-rw-r--r-- 1 chenkefa aifmtest-PG0     1311 Feb  8 21:52 Makefile
-rw-r--r-- 1 chenkefa aifmtest-PG0      895 Feb  8 21:52 README.txt
-rw-r--r-- 1 chenkefa aifmtest-PG0       74 Feb  8 22:22 log.1331
-rw-r--r-- 1 chenkefa aifmtest-PG0        0 Feb  8 22:22 log.2218
-rwxr-xr-x 1 chenkefa aifmtest-PG0 17112112 Feb  8 22:22 main
-rw-r--r-- 1 chenkefa aifmtest-PG0    14328 Feb  8 22:22 main.cpp
-rw-r--r-- 1 chenkefa aifmtest-PG0     2941 Feb  8 22:22 main.d
-rw-r--r-- 1 chenkefa aifmtest-PG0  4716248 Feb  8 22:22 main.o
-rwxr-xr-x 1 chenkefa aifmtest-PG0      945 Feb  8 21:52 run.sh

The content of log.1331 is the same as the description of README.txt , but log.2218 is still empty. By the way, how long is every experiment expected to run?

zainryan commented 2 years ago

Hi ckf104,

I just tried it out in my machine. The result comes out fairly quickly---around 5 mins, please see the log below. Have you tried to kill the processes in both machines and rerun the experiment? I'm willing to assist if you find it reproducible at your end.

zainruan@node-0:~/AIFM/aifm/exp/fig6bc$ ls -l
total 17420
-rw-r--r-- 1 zainruan shenango-PG0       73 Feb 17 12:48 log.1331
-rw-r--r-- 1 zainruan shenango-PG0       74 Feb 17 12:54 log.2218
-rwxr-xr-x 1 zainruan shenango-PG0 17223576 Feb 17 12:54 main
-rw-r--r-- 1 zainruan shenango-PG0    14329 Feb 17 12:54 main.cpp
-rw-r--r-- 1 zainruan shenango-PG0     3001 Feb 17 12:54 main.d
-rw-r--r-- 1 zainruan shenango-PG0  4699384 Feb 17 12:54 main.o
-rw-r--r-- 1 zainruan shenango-PG0     1311 Feb 16 10:05 Makefile
-rw-r--r-- 1 zainruan shenango-PG0      895 Feb 16 10:05 README.txt
-rwxr-xr-x 1 zainruan shenango-PG0      945 Feb 16 10:05 run.sh
zainruan@node-0:~/AIFM/aifm/exp/fig6bc$
ckf104 commented 2 years ago

Thanks for the kind response.

I have rebooted two machines and faced the same situation. I have done the following operations since I logged into cloudlab machines.

  1. Install the softwares requested in the Readme in two machines
  2. Run build_all.sh to build AIFM in two machines
  3. Change the configs/ssh file only in node0 and make sure node0 can connect node1 without password by ssh.
  4. Run fig6bc/run.sh only in node0 and find above described result.

A weird thing is that I find following extra terminal output when running test.sh and run.sh

{: Command not found.
}: Command not found.
{: Command not found.
}: Command not found.
zainryan commented 2 years ago

There shouldn't be any error when executing test.sh or run.sh. I guess something is not configured right on your machine. If you could grant me the permission to access your cloudlab machine, I'll ssh into it and take a look. My public key can be found here https://github.com/zainryan.keys

ckf104 commented 2 years ago

I have added the public key into two cloudlab machines. You can log in by following command if I doesn't make a mistake:)

ssh -p 22 chenkefa@hp022.utah.cloudlab.us # node 0
ssh -p 22 chenkefa@hp005.utah.cloudlab.us # node 1

And may need to reconfig ssh service to connect node 1 in node 0 without password. Really thank you for the kind help!

zainryan commented 2 years ago

Okay, I see what's going on. Simply change the default shell into bash and you're ready to go. All scripts in this repo are written for bash. I'll leave a note in README.

I've done the modification on your machines and I can get the result fairly quickly without any issue.