AIFM-sys / AIFM

AIFM: High-Performance, Application-Integrated Far Memory
MIT License
104 stars 34 forks source link

ASSERTION 'tcp_dial(laddr, raddr, &remote_master_) != 0' FAILED IN 'TCPDevice' #12

Closed wu1pu1 closed 2 years ago

wu1pu1 commented 2 years ago

Hi zain,when I run ./run.sh in exp/fig6a/aifm_nt,The following error message appears,Can you help me see what went wrong? CPU 19| <5> cpu: detected 20 cores, 1 nodes CPU 19| <5> time: detected 2394 ticks / us [ 0.000646] CPU 19| <5> loading configuration from '/users/wpq/aifm/aifm/configs/client.config' [ 0.000693] CPU 19| <3> < 1 guaranteed kthreads is not recommended for networked apps [ 0.019104] CPU 19| <5> net: started network stack [ 0.019118] CPU 19| <5> net: using the following configuration: [ 0.019124] CPU 19| <5> addr: 18.18.1.2 [ 0.019131] CPU 19| <5> netmask: 255.255.255.0 [ 0.019138] CPU 19| <5> gateway: 18.8.1.1 [ 0.019144] CPU 19| <5> mac: D2:35:87:84:25:65 [ 0.389310] CPU 19| <5> thread: created thread 0 [ 0.389429] CPU 19| <5> spawning 18 kthreads [ 0.389564] CPU 10| <5> thread: created thread 1 [ 0.389651] CPU 02| <5> thread: created thread 2 [ 0.389755] CPU 13| <5> thread: created thread 3 [ 0.389876] CPU 05| <5> thread: created thread 4 [ 0.390188] CPU 12| <5> thread: created thread 5 [ 0.390383] CPU 07| <5> thread: created thread 6 [ 0.390551] CPU 16| <5> thread: created thread 7 [ 0.390787] CPU 18| <5> thread: created thread 8 [ 0.390924] CPU 00| <5> thread: created thread 9 [ 0.391127] CPU 14| <5> thread: created thread 10 [ 0.391283] CPU 17| <5> thread: created thread 11 [ 0.391635] CPU 08| <5> thread: created thread 12 [ 0.391778] CPU 10| <5> thread: created thread 13 [ 0.392052] CPU 05| <5> thread: created thread 14 [ 0.392254] CPU 06| <5> thread: created thread 15 [ 0.392405] CPU 12| <5> thread: created thread 16 [ 0.392586] CPU 04| <5> thread: created thread 17 [ 5.398783] CPU 06| <0> FATAL: ../../..//src/device.cpp:65 ASSERTION 'tcp_dial(laddr, raddr, &remotemaster) != 0' FAILED IN 'TCPDevice' ./main(+0x697bb)[0x558831fa17bb] ./main(+0x69820)[0x558831fa1820] ./main(+0x26e14)[0x558831f5ee14] ./main(+0x13284)[0x558831f4b284] ./main(+0xfe41)[0x558831f47e41] ./main(+0x57260)[0x558831f8f260] [ 5.399004] CPU 06| <5> init: shutting down -> FAILURE

zainryan commented 2 years ago

Hi puqing,

  1. Have you set bash as your default shell? See "Set bash as the default shell." in README.md.
  2. Have you run Shenango's setup script on both machines? See "Setup Shenango (on all nodes)" in README.md.
zainryan commented 2 years ago

Hi puqing, does the suggestion above work for you?

wu1pu1 commented 2 years ago

2. Have you run Shenango's setup script on both machines? See "Setup Shenango (on all nodes)" in READ Hi zain,Sorry for the late reply! When I set bash as my default shell: chsh -s /bin/bash Password: But I don't remember setting the password before. Do you know what the password is?Thank you!

zainryan commented 2 years ago
  1. Have you run Shenango's setup script on both machines? See "Setup Shenango (on all nodes)" in READ Hi zain,Sorry for the late reply! When I set bash as my default shell: chsh -s /bin/bash Password: But I don't remember setting the password before. Do you know what the password is?Thank you!

You can use sudo which doesn't require password

wu1pu1 commented 2 years ago
  1. Have you run Shenango's setup script on both machines? See "Setup Shenango (on all nodes)" in READ Hi zain,Sorry for the late reply! When I set bash as my default shell: chsh -s /bin/bash Password: But I don't remember setting the password before. Do you know what the password is?Thank you!

You can use sudo which doesn't require password Hi zain,Thank you for your answer! The problem just now has been solved! when I run ./run.sh in exp/fig6a/aifm_nt,The following message appears, Is it normal to display segmentation fault?

{: Command not found. }: Command not found. CPU 08| <5> cpu: detected 20 cores, 1 nodes CPU 08| <5> time: detected 2394 ticks / us [ 0.000651] CPU 08| <5> loading configuration from '/users/wpq/aifm/aifm/configs/client.config' [ 0.000694] CPU 08| <3> < 1 guaranteed kthreads is not recommended for networked apps [ 0.019188] CPU 08| <5> net: started network stack [ 0.019203] CPU 08| <5> net: using the following configuration: [ 0.019209] CPU 08| <5> addr: 18.18.1.2 [ 0.019216] CPU 08| <5> netmask: 255.255.255.0 [ 0.019221] CPU 08| <5> gateway: 18.8.1.1 [ 0.019229] CPU 08| <5> mac: F2:02:3D:18:E1:FC [ 0.388789] CPU 03| <5> thread: created thread 0 [ 0.388923] CPU 03| <5> spawning 18 kthreads [ 0.389081] CPU 14| <5> thread: created thread 1 [ 0.389161] CPU 05| <5> thread: created thread 2 [ 0.389312] CPU 06| <5> thread: created thread 3 [ 0.389373] CPU 15| <5> thread: created thread 4 [ 0.389631] CPU 18| <5> thread: created thread 5 [ 0.389830] CPU 09| <5> thread: created thread 6 [ 0.390089] CPU 00| <5> thread: created thread 7 [ 0.390299] CPU 02| <5> thread: created thread 8 [ 0.390523] CPU 13| <5> thread: created thread 9 [ 0.390860] CPU 05| <5> thread: created thread 10 [ 0.390960] CPU 12| <5> thread: created thread 11 [ 0.391107] CPU 00| <5> thread: created thread 12 [ 0.391307] CPU 03| <5> thread: created thread 13 [ 0.391454] CPU 14| <5> thread: created thread 14 [ 0.391687] CPU 02| <5> thread: created thread 15 [ 0.391914] CPU 13| <5> thread: created thread 16 [ 0.392125] CPU 10| <5> thread: created thread 17 Prepare... Bench... mops = 0.0443197 90 tail lat (cycles) = 144006 Segmentation fault

zainryan commented 2 years ago

I think the script is still being run with other shells instead bash, otherwise, you won't see "{: Command not found." Just log out and re-login and run "echo $SHELL" to confirm that bash is being used.

wu1pu1 commented 2 years ago

Yes,the script doesn't switch to bash. What should I do? sudo chsh -s /bin/bash echo $shell /bin/tcsh

zainryan commented 2 years ago

Use sudo passwd to change your password. Then run chsh without sudo using your account.

zainryan commented 2 years ago

Btw, you can set your default shell at https://www.cloudlab.us/myaccount.php. This will be applied to all new cloudlab instances created by you.

wu1pu1 commented 2 years ago

Btw, you can set your default shell at https://www.cloudlab.us/myaccount.php. This will be applied to all new cloudlab instances created by you. Thanks! The problem was solved. Just as what you have mentioned.

zainryan commented 2 years ago

Great, just let me know if you encounter any new questions. Btw, since your last script execution fails in the middle, there might be some processes that were not cleaned up properly, which may affect your new execution. You can kill them manually. For example, in this case, just ssh into both nodes and run "sudo pkill -9 iokerneld; sudo pkill -9 main; sudo pkill -9 tcp_device". You can use top or ps command to confirm that things are clear.

wu1pu1 commented 2 years ago

OK,Really appreciate your help!