Mellanox / k8s-rdma-sriov-dev-plugin

Kubernetes Rdma SRIOV device plugin
Apache License 2.0
109 stars 27 forks source link

Driver doesn't support SRIOV configuration via sysfs #19

Open LucaPrete opened 5 years ago

LucaPrete commented 5 years ago

We have a Mellanox ConnectX-3 dual port nic.

We're following this guide: https://community.mellanox.com/s/article/reference-deployment-guide-for-k8s-cluster-with-mellanox-rdma-device-plugin-and-multus-cni-plugin-with-two-network-interfaces--flannel-and-mellanox-sr-iov---draft-x

Everything runs smoothly, until I activate the device plugin. The plugin installs fine, but when I look at the logs I see

/sys/class/net/eth2/device/sriov_numvfs: Function not implemented

I did manually what I think the plugin does on each physical node running k8s. For example,

echo 8 | sudo tee /sys/class/net/eth2/device/sriov_numvfs

I always get the same error:

/sys/class/net/eth2/device/sriov_numvfs: Function not implemented

Also, if I do

dmesg | grep -i mlx

I see this error:

mlx4_core 0000:03:00.0: Driver doesn't support SRIOV configuration via sysfs.

As an experiment, I've also tried to activate VFs through the mlx_core driver configuration (as for example described here: https://community.mellanox.com/s/article/howto-configure-sr-iov-for-connectx-3-with-kvm--ethernet-x). In this case VFs come up and everything works fine, but unfortunately this doesn't seem to be compatible with the SRIOV device plugin.

This is really blocking us..any suggestion would be highly appreciated!

Thanks.

paravmellanox commented 5 years ago

Hi @LucaPrete ,

ConnectX3 are not supported by the plugin. I recommend you to upgrade to ConnectX4 or 5. They bring lot of features that will be useful for rdma and nic.

LucaPrete commented 5 years ago

Thank you @paravmellanox ! Unfortunately all our servers run the ConnectX3. I'm wondering if the plugin could be modified so it doesn't configure at system level the VFs, but just uses the ones configured through the driver...what do you think? If we achieve this, what other limitations do you see?

paravmellanox commented 5 years ago

Hi @LucaPrete what functionality do you plan to run on ConnectX3? rdma, ethernet, dpdk, or part of it?

LucaPrete commented 5 years ago

@paravmellanox I've seen the RDMA term in other places as well, but I'm not very familiar with it. I have to do some homework here :) For now, what we have to achieve is to realize a PoC where k8s containers can have a second SR-IOV NIC. The NIC is then connected to a custom fabric. DPDK support may be nice as a next step, but not mandatory for the first one.

paravmellanox commented 5 years ago

@LucaPrete, Connectx3 are pretty old now. Can you please talk to customer support as you need DPDK support in next step? We don't have strong plan to support DPDK mode. Most users have upgraded to ConnectX4/5 so...

LucaPrete commented 5 years ago

I understand...we'll definitely put in the plans to use another NIC. I was wondering if in the meantime we could anyway use the ConnectX-3 without DPDK support (for the PoC).

Il giorno lun 14 gen 2019 alle ore 09:24 Parav Pandit < notifications@github.com> ha scritto:

@LucaPrete https://github.com/LucaPrete, Connectx3 are pretty old now. Can you please talk to customer support as you need DPDK support in next step? We don't have strong plan to support DPDK mode. Most users have upgraded to ConnectX4/5 so...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Mellanox/k8s-rdma-sriov-dev-plugin/issues/19#issuecomment-454088328, or mute the thread https://github.com/notifications/unsubscribe-auth/ACOP0ALHTCCcEvXv89BUf3ODT3YdIrSkks5vDL09gaJpZM4Z8zh3 .

paravmellanox commented 5 years ago

@LucaPrete for PoC is fine to use ConnectX3. It requires some work, so some hacks should be done for PoC.