WukLab / LegoOS

Disseminated, Distributed OS for Hardware Resource Disaggregation. USENIX OSDI 2018 Best Paper.
http://LegoOS.io
GNU General Public License v2.0
487 stars 73 forks source link

Failed in rebooting machine with linux-kernel 3.11.1 #23

Open anoyiuhu opened 3 years ago

anoyiuhu commented 3 years ago

Hi, @lastweek

As LegoOS required, I tried to install linux kernel 3.11.1 in my server as storage node. However, after installing kernel-3.11.1, I couldn't reboot the machine. I tried kernel 3.11.1 in both CentOS-7 and Ubuntu(14,16,18,20).

For CentOS-7, after I rebooted machine, the monitor showed a black screen with a cursor in the upper left corner, and the system seemed to hang.

For Ubuntu(14,16,18,20) OS, the the monitor showed "loading initial Ramdisk...." with a cursor in the head of next line, and the system also seemed to hang.

Even in virtual machine with UbuntuOS, the system still hanged and couldn't reboot successfully with kernel-3.11.1.

I doubt if there is any bug in this kernel? Because this is not a long-term support version. The kernel source code is downloaded from linux kernel official site.

Could you please provide us some suggestions? Looking forward to your reply.

Thanks! Best regards

lastweek commented 3 years ago

Hi, it is mostly configuration issues rather than kernel bugs. Did you first copy .config from /boot and then do a make oldconfig?

anoyiuhu commented 3 years ago

Yes, the following is my kernel switch procedure in CentOS-7 (1) copy /boot/config-xxxx .config (2) make menuconfig/oldconfig (I tried both) (3) make -j; (4) make -j modules_install install

It's very weird to show nothing but a black screen after rebooting!

anoyiuhu commented 3 years ago

Hi, @lastweek I have one question, Why does legoos storage node have to run on the 3.11.1 kernel? Can we use other stable versions of the kernel?

Boon-Jun commented 3 years ago

I'm having troubles with this too. I tried setting up 3.11.1 on both CentOS & Ubuntu on the r320 cloud instance and the results are the same as what was described by @anoyiuhu. It will be great to get some updates on this!

lastweek commented 3 years ago

Hi @anoyiuhu and @Boon-Jun,

Using 3.11.1 is due to kernel API compatibility issue. The storage node has two key modules, an Infiniband (IB) module and a lego storage monitor. When we built these modules, we were using 3.11.1 kernel, hence its APIs. Unfortunately, some IB and file kernel APIs were changed in later kernel releases. So these modules stuck with 3.11.1.

If you try a new kernel, you are likely to a lot compiler errors due to API mismatch. You can try to modify the code but it is a time-consuming process.

Can you guys tell me what's the filesystem in your CentOS and Ubuntu? I remember XFS has bugs.

anoyiuhu commented 3 years ago

Hi, @lastweek

I used xfs for centos7 and ext4 for ubuntu, respectively. I have three questions for your latest comments:

  1. If xfs doesn't work for LegoOS, which filesystem do you recommend for LegoOS storage node with 3.11.1.
  2. I have tried centos7.2 in virtualbox which was described in your GitHub readme. It succeeded in switching to linux 3.11.1 with xfs. So this system halt during reboot was not necessarily due to xfs
  3. According to news in LegoOS readme, it said that you have already deployed LegoOS in CloudLab, could you please share your configuration? (According to this issue, they used ubuntu14.04 in r320 and successfully rebooted the system. But CloudLab no longer provided ubuntu14.04.)

Looking forward to your reply.

Thanks!

Boon-Jun commented 3 years ago

As for me, I used xfs on centos7 and tried both ext3 & ext4 filesystems on ubuntu 18.04. My setup is entirely on cloudlab's r320 instances so it will definitely help if there are any details on how linux v3.11.1 is installed there.