amzn / amzn-drivers

Official AWS drivers repository for Elastic Network Adapter (ENA) and Elastic Fabric Adapter (EFA)
455 stars 175 forks source link

Build failed with Oracle UEK4 kernel #65

Closed svpcom closed 6 years ago

svpcom commented 6 years ago
$ make BUILD_KERNEL=4.1.12-124.8.1.el7uek.x86_64
make -C /lib/modules/4.1.12-124.8.1.el7uek.x86_64/build M=/home/fg/amzn-drivers/kernel/linux/ena modules
make[1]: Entering directory `/usr/src/kernels/4.1.12-124.8.1.el7uek.x86_64'
/home/fg/amzn-drivers/kernel/linux/ena/Makefile:30: *** only UEK3 with kernel version 3.8.13 is suppported.  Stop.
make[1]: *** [_module_/home/fg/amzn-drivers/kernel/linux/ena] Error 2
make[1]: Leaving directory `/usr/src/kernels/4.1.12-124.8.1.el7uek.x86_64'
make: *** [all] Error 2
akiyano commented 6 years ago

Hi @svpcom,

Sorry about the delayed answer. As a quick fix you can use version 1.5.1 (git checkout ena_linux_1.5.1) which does compile in UEK4 and is functionally the same as 1.5.2. We intend to fix this in future releases.

Regards, Arthur

zorikm commented 6 years ago

@svpcom, Please also note that ENA driver is included in UEK4.

mbobak commented 6 years ago

Hi guys,

I'm not sure what's going on, exactly. I have Oracle Linux 7.5 with UEK4 kernel 4.1.12-124.16.4.el7uek.x86_64. It works fine, I have an AMI built from this, and I can launch new instances from this AMI, all day long, no problem.

But, if I do 'yum update', I get upgraded to UEK4 kernel 4.1.12-124.18.1.el7uek and if I try to build ena there, I get the error saying it's only compatible with UEK3. But, clearly, it's compatible with UEK4, as thata's the kernel I'm currently running on.

Here's the ena version I have that's working with UEK4 kernel 4.1.12-124.16.4.el7uek.x86_64: [oracle@ip-172-16-4-253 ~]$ sudo modinfo ena filename: /lib/modules/4.1.12-124.16.4.el7uek.x86_64/kernel/drivers/net/ethernet/amazon/ena/ena.ko version: 1.1.2 license: GPL description: Elastic Network Adapter (ENA) author: Amazon.com, Inc. or its affiliates srcversion: 1CCD9807B601A1966B96ADD alias: pci:v00001D0Fd0000EC21svsdbcsci alias: pci:v00001D0Fd0000EC20svsdbcsci alias: pci:v00001D0Fd00001EC2svsdbcsci alias: pci:v00001D0Fd00000EC2svsdbcsci depends: retpoline: Y intree: Y vermagic: 4.1.12-124.16.4.el7uek.x86_64 SMP mod_unload modversions signer: Oracle CA Server sig_key: 5F:A0:92:2E:91:B7:47:C3:35:BA:18:5F:20:45:10:6C:5C:E8:6A:AC sig_hashalgo: sha512 parm: debug:Debug level (0=none,...,16=all) (int)

So, does the later UEK4 kernel use a later version of the ena driver, which have the UEK3 only regression?

Can someone help me understand what's going on here?

-Mark

mbobak commented 6 years ago

I should add, I'm only using the ena driver that can with UEK4. Thus far, I've not tried downloading and installing my own.

yastreb78 commented 6 years ago

@mbobak

I guess Oracle just picked up ENA driver source (either v1.5.2 or v1.5.3) from this GitHub and faced thus you see the problem mentioned above. This limitation in our release of v1.5.2 for UEK3 was artificial ( we didn't had an option to check UEK4 during release), so I believe you can remove this check https://github.com/amzn/amzn-drivers/blob/master/kernel/linux/ena/Makefile#L30 and see if it works for you

mbobak commented 6 years ago

I'll give that a try, thanks!

mbobak commented 6 years ago

So, I already tried that. The code block in question: ifneq ($(strip $(IS_3_8_13)),) ccflags-y += -DUEK3_RELEASE else $(error only UEK3 with kernel version 3.8.13 is suppported) endif endif endif

I tried commenting out only the 'else' and the ' $(error only UEK3 with kernel version 3.8.13 is suppported)' So, my file looked like this: ifneq ($(strip $(IS_3_8_13)),) ccflags-y += -DUEK3_RELEASE

else

$(error only UEK3 with kernel version 3.8.13 is suppported)

endif

endif endif

Then, I ran 'sudo dracut --force --regenerate-all', and I got: Failed to install module ena

Broadcast message from systemd-journald@ip-172-16-4-253.ec2.internal (Fri 2018-08-10 15:02:05 EDT):

dracut[24811]: Failed to install module ena

Message from syslogd@ip-172-16-4-253 at Aug 10 15:02:05 ... dracut:Failed to install module ena

Help! Any other ideas or suggestions as to what to try?

-Mark

mbobak commented 6 years ago

Argh, I tried to indicate that I commented out those two lines with a '#', but it apparently got interpreted somehow in the post above. Sorry.

mbobak commented 6 years ago

Also, based my output from 'modinfo', the code for the ena driver that's in UEK4 appears to be version: version: 1.1.2

-Mark

akiyano commented 6 years ago

Hi Mark,

I will try to answer according to what I understood. You want to install the latest ENA driver on a OL uek4 kernel 4.1.12-124.18.1.el7uek machine right? I was able to do that using ami "O_Linux_UEK_ENA (ami-6df3a815)" in Oregon.

Here are the exact commands I ran, maybe they will help you sort it out:

  1. Ran the instance,
  2. uname -r showed: 4.1.12-124.17.2.el7uek.x86_64
  3. Waited a while (10 minutes) for the instance to update itself (running sudo yum update immediately fails because yum is working)
  4. sudo yum update (this didn't do anything, since the updates were already done automatically at system startup)
  5. Restarted the instance and connected to it again
  6. uname -r showed: 4.1.12-124.18.1.el7uek.x86_64
  7. git clone https://github.com/amzn/amzn-drivers.git
  8. cd amzn-drivers/kernel/linux/ena
  9. git checkout ena_linux_1.5.1
  10. make
  11. sudo cp ena.ko /lib/modules/4.1.12-124.18.1.el7uek.x86_64/kernel/drivers/net/ethernet/amazon/ena/
  12. sudo depmod -a
  13. sudo dracut -f
  14. sudo reboot
  15. Connect again to the instance
  16. ethtool -i eth0 shows that the driver version is 1.5.1g

(*) in 7. I moved to version 1.5.1 of the driver since 1.5.3 wouldn't compile like you said. We haven't committed the fix for this compilation error yet. You could do the fix like you suggested (commenting out the 2 lines in the Makefile) and use version 1.5.3, this worked for me as well. So I'm not sure why your setup fails.

Did any of this help you solve your problem?

Best regards, Arthur

mbobak commented 6 years ago

Hi Arthur,

Sorry for the delay in responding, I had some other stuff to deal with.

Sadly, no, it didn't work for me.

First off, you're launching from AMI, and it's automatically updating on launch (yum update), and then you reboot, and it lets you login. That right there tells me there's a difference. Perhaps I should try the AMI you mentioned above....

When I do it, I launch from my AMI, and it boots up, no problem, on 4.1.12-124.16.4.el7uek.x86_64. When I do a yum update, it upgrades kernel-uek to 4.1.12-124.18.6.el7uek.x86_64. (I think there's been another update to UEK, it was -124.18.1, now it's -124.18.6.) When it does that, on building the module, it's encountering the UEK3 only error. So, after that, I did as your suggested, and built the ena.ko module from the Git repo, version 1.5.1, and it worked fine. I copied it to the modules directory for the new kernel. (4.1.12-124.18.6.el7uek.x86_64) Then, I did 'dracut -f -v /boot/initramfs-4.1.12-124.18.6.el7uek.x86_64.ing 4.1.12-124.18.6.el7uek.x86_64'. That seemed to work. Then I checked the position of the new kernel in /etc/grub2.cfg, it was the first listed, so I did 'grub2-set-dfefault 0' and then 'grub2-mkconfig -o /boot/grub2/grub.cfg' and that also seemed to work fine.

Then, I reboot, and I get a long wait followed by '1/2 status checks'. The really weird thing, though, is that I'm getting no system log at all, so I have no idea how to debug. I think I'm going to give your AMI a try.....

Thanks again for all the help!

-Mark

mbobak commented 6 years ago

Hi again, Arthur,

Do you own that AMI? I tried copying it to US-EAST-1 region, but it said I didn't have permission to the storage. Can you copy it to US-EAST-1? I'm not set up to do work in US-WEST-2. Or, can you change permissions so I can copy it to US-EAST-1 myself?

Thanks,

-Mark

akiyano commented 6 years ago

Hi Mark,

Regarding your last question - you can save the ami in US-WEST-2 and then you will be able to copy your saved ami to US-EAST-1. I did that and it worked for me.

If you have further questions regarding this issue, please contact me directly to akiyano@amazon.com.

Regards, Arthur

akiyano commented 6 years ago

Hi Mark,

Were you able to solve your problems?

Thanks, Arthur

mbobak commented 6 years ago

Arthur,

I did finally resolve my issue, by switching to ol7_developer_UEKR5 repo, which has kernel 4.14.35-1833.el7uek.x86_64 with ena driver 1.5.0K.

I'm going to call that good enough. :-)

Thanks for all the help!

-Mark