Xilinx / FPGA_as_a_Service

https://docs.xilinx.com/r/en-US/Xilinx_Kubernetes_Device_Plugin/Xilinx_Kubernetes_Device_Plugin
Apache License 2.0
143 stars 60 forks source link

修改fpga.go之后,我的插件日志是否正常? #8

Closed Vae1997 closed 4 years ago

Vae1997 commented 4 years ago

您好,根据您之前的建议,我修改了fpga.go 之后,我在zcu102上重新执行build脚本,使用Dockerfile重新构建arm64的插件镜像 最后,执行命令部署插件,日志如下: root@zcu102:~ kubectl logs -n kube-system fpga-device-plugin-daemonset-pcvhq time="2019-12-17T08:05:18Z" level=info msg="Starting FS watcher." time="2019-12-17T08:05:18Z" level=info msg="Starting OS watcher." time="2019-12-17T08:05:19Z" level=info msg="Starting to serve on /var/lib/kubelet/device-plugins/drm_minor-20191217-fpga.sock" 2019/12/17 08:05:19 grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: write unix /var/lib/kubelet/device-plugins/drm_minor-20191217-fpga.sock->@: write: broken pipe" time="2019-12-17T08:05:19Z" level=info msg="Registered device plugin with Kubelet xilinx.com/fpga-drm_minor-20191217" time="2019-12-17T08:05:19Z" level=info msg="Sending 1 device(s) [&Device{ID:a0000000.zyxclmm_drm,Health:Healthy,}] to kubelet" root@zcu102:~# 看上去和README中的输出一致,但是我注意到最后少了一条: msg="Receiving request 1" server.go中的Allocate方法应该输出这条信息,表示kubelet返回给插件的设备信息 但是就目前来看应该没有返回。。。 另外我通过describe node发现在Capacity字段和Allocatable字段都已经显示: xilinx.com/fpga-drm_minor-20191217:1 接下来我应该怎么做,来保证插件正常部署?

xuhz commented 4 years ago

已经正常部署了。下面你部署一个普通pod申请插件注册的类型, xilinx.com/fpga-drm_minor-20191217:1 log里面应该会显示"Receiving request 1"

然后登陆到那个pod里面,如果看到/dev/dri/renderD128,就表示在容器里面可以访问fpga设备了。

On Tue, Dec 17, 2019 at 12:37 AM Vae notifications@github.com wrote:

您好,根据您之前的建议,我修改了fpga.go 之后,我在zcu102上重新执行build脚本,使用Dockerfile重新构建arm64的插件镜像 最后,执行命令部署插件,日志如下: root@zcu102:# kubectl logs -n kube-system fpga-device-plugin-daemonset-pcvhq time="2019-12-17T08:05:18Z" level=info msg="Starting FS watcher." time="2019-12-17T08:05:18Z" level=info msg="Starting OS watcher." time="2019-12-17T08:05:19Z" level=info msg="Starting to serve on /var/lib/kubelet/device-plugins/drm_minor-20191217-fpga.sock" 2019/12/17 08:05:19 grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: write unix /var/lib/kubelet/device-plugins/drm_minor-20191217-fpga.sock->@: write: broken pipe" time="2019-12-17T08:05:19Z" level=info msg="Registered device plugin with Kubelet xilinx.com/fpga-drm_minor-20191217" time="2019-12-17T08:05:19Z" level=info msg="Sending 1 device(s) [&Device{ID:a0000000.zyxclmm_drm,Health:Healthy,}] to kubelet" root@zcu102:# 看上去和README中的输出一致,但是我注意到最后少了一条: msg="Receiving request 1" server.go中的Allocate方法应该输出这条信息,表示kubelet返回给插件的设备信息 但是就目前来看应该没有返回。。。 另外我通过describe node发现在Capacity字段和Allocatable字段都已经显示: xilinx.com/fpga-drm_minor-20191217:1 接下来我应该怎么做,来保证插件正常部署?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Xilinx/FPGA_as_a_Service/issues/8?email_source=notifications&email_token=ADFZBM5PLP4YDVZV772I5TDQZCFUHA5CNFSM4J3XWVO2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IA7G56Q, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFZBMZAPRIQK5TONDU7RZ3QZCFUHANCNFSM4J3XWVOQ .

Vae1997 commented 4 years ago

已经正常部署了。下面你部署一个普通pod申请插件注册的类型, xilinx.com/fpga-drm_minor-20191217:1 log里面应该会显示"Receiving request 1" 然后登陆到那个pod里面,如果看到/dev/dri/renderD128,就表示在容器里面可以访问fpga设备了。 On Tue, Dec 17, 2019 at 12:37 AM Vae @.> wrote: 您好,根据您之前的建议,我修改了fpga.go 之后,我在zcu102上重新执行build脚本,使用Dockerfile重新构建arm64的插件镜像 最后,执行命令部署插件,日志如下: @.:# kubectl logs -n kube-system fpga-device-plugin-daemonset-pcvhq time="2019-12-17T08:05:18Z" level=info msg="Starting FS watcher." time="2019-12-17T08:05:18Z" level=info msg="Starting OS watcher." time="2019-12-17T08:05:19Z" level=info msg="Starting to serve on /var/lib/kubelet/device-plugins/drm_minor-20191217-fpga.sock" 2019/12/17 08:05:19 grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: write unix /var/lib/kubelet/device-plugins/drm_minor-20191217-fpga.sock->@: write: broken pipe" time="2019-12-17T08:05:19Z" level=info msg="Registered device plugin with Kubelet xilinx.com/fpga-drm_minor-20191217" time="2019-12-17T08:05:19Z" level=info msg="Sending 1 device(s) [&Device{ID:a0000000.zyxclmm_drm,Health:Healthy,}] to kubelet" @.***:# 看上去和README中的输出一致,但是我注意到最后少了一条: msg="Receiving request 1" server.go中的Allocate方法应该输出这条信息,表示kubelet返回给插件的设备信息 但是就目前来看应该没有返回。。。 另外我通过describe node发现在Capacity字段和Allocatable字段都已经显示: xilinx.com/fpga-drm_minor-20191217:1 接下来我应该怎么做,来保证插件正常部署? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#8?email_source=notifications&email_token=ADFZBM5PLP4YDVZV772I5TDQZCFUHA5CNFSM4J3XWVO2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IA7G56Q>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFZBMZAPRIQK5TONDU7RZ3QZCFUHANCNFSM4J3XWVOQ .

是的,我将dp-pod.yaml中的资源更改后部署,插件日志确实显示收到回复 但是我发现pod状态为Error,describe查看显示Back-off restarting failed container logs进一步查看显示:standard_init_linux.go:207: exec user process caused "exec format error" 这个问题我之前部署k8s时碰到过 原因就是部署pod使用的镜像只是在x86下有效,我在DockerHub查看 xilinxatg/fpga-verify:latest 确实只给出了amd64版本的镜像 因此下一步我应该是在zcu102上,参考该镜像的Dockerfile手动build一个arm64版的镜像 (另外,我必须将之前配置XRT时生成的文件以COPY命令加入Dockerfile 从而最终保证进入pod后可以正常运行测试XRT的相关脚本) 将dp-pod.yaml中的镜像改为新build的arm64版的image,pod才可以running 但是我注意到DockerHub上这个镜像并未给出完整Dockerfile 我不知道您这边是否可以公开xilinxatg/fpga-verify镜像的Dockerfile?

xuhz commented 4 years ago

fpga-verify的dockerfile对嵌入式没用的,它仅仅包含可以在shell上运行的helloworld kernel 你现在可以做的是,找一个可以运行在裸102上的引用,然后把它做成镜像,看看通过k8s能不能调度执行。或者随便试一个ubuntu,然后登陆进那个pod,能看到/dev/dri/renderD128,就表示插件可以了

On Tue, Dec 17, 2019 at 5:26 PM Vae notifications@github.com wrote:

已经正常部署了。下面你部署一个普通pod申请插件注册的类型, xilinx.com/fpga-drmminor-20191217:1 log里面应该会显示"Receiving request 1" 然后登陆到那个pod里面,如果看到/dev/dri/renderD128,就表示在容器里面可以访问fpga设备了。 … <#m-2621468818705977382_> On Tue, Dec 17, 2019 at 12:37 AM Vae @.> wrote: 您好,根据您之前的建议,我修改了fpga.go 之后,我在zcu102上重新执行build脚本,使用Dockerfile重新构建arm64的插件镜像 最后,执行命令部署插件,日志如下: @.:# kubectl logs -n kube-system fpga-device-plugin-daemonset-pcvhq time="2019-12-17T08:05:18Z" level=info msg="Starting FS watcher." time="2019-12-17T08:05:18Z" level=info msg="Starting OS watcher." time="2019-12-17T08:05:19Z" level=info msg="Starting to serve on /var/lib/kubelet/device-plugins/drm_minor-20191217-fpga.sock" 2019/12/17 08:05:19 grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: write unix /var/lib/kubelet/device-plugins/drm_minor-20191217-fpga.sock->@: write: broken pipe" time="2019-12-17T08:05:19Z" level=info msg="Registered device plugin with Kubelet xilinx.com/fpga-drm_minor-20191217" time="2019-12-17T08:05:19Z" level=info msg="Sending 1 device(s) [&Device{ID:a0000000.zyxclmm_drm,Health:Healthy,}] to kubelet" @.***:# 看上去和README中的输出一致,但是我注意到最后少了一条: msg="Receiving request 1" server.go中的Allocate方法应该输出这条信息,表示kubelet返回给插件的设备信息 但是就目前来看应该没有返回。。。 另外我通过describe node发现在Capacity字段和Allocatable字段都已经显示: xilinx.com/fpga-drm_minor-20191217:1 接下来我应该怎么做,来保证插件正常部署? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#8 https://github.com/Xilinx/FPGA_as_a_Service/issues/8?email_source=notifications&email_token=ADFZBM5PLP4YDVZV772I5TDQZCFUHA5CNFSM4J3XWVO2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IA7G56Q>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFZBMZAPRIQK5TONDU7RZ3QZCFUHANCNFSM4J3XWVOQ .

是的,我将dp-pod.yaml中的资源更改后部署,插件日志确实显示收到回复 但是我发现pod状态为Error,describe查看显示Back-off restarting failed container logs进一步查看显示:standard_init_linux.go:207: exec user process caused "exec format error" 这个问题我之前部署k8s时碰到过 原因就是部署pod使用的镜像只是在x86下有效,我在DockerHub查看 xilinxatg/fpga-verify:latest 确实只给出了amd64版本的镜像 因此下一步我应该是在zcu102上,参考该镜像的Dockerfile手动build一个arm64版的镜像 (另外,我必须将之前配置XRT时生成的文件以COPY命令加入Dockerfile 从而最终保证进入pod后可以正常运行测试XRT的相关脚本) 将dp-pod.yaml中的镜像改为新build的arm64版的image,pod才可以running 但是我注意到DockerHub上这个镜像并未给出完整Dockerfile 我不知道您这边是否可以公开xilinxatg/fpga-verify镜像的Dockerfile?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Xilinx/FPGA_as_a_Service/issues/8?email_source=notifications&email_token=ADFZBM2IPQXYMN6TWL3QCPDQZF35VA5CNFSM4J3XWVO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHEQ7UI#issuecomment-566824913, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFZBM5WSC542MUTP5ADYDDQZF35VANCNFSM4J3XWVOQ .

Vae1997 commented 4 years ago

不错!镜像改为ubuntu后进入容器,可以看到/dev/dri/renderD128 目前来看插件这部分应该没问题了 接下来我将按照您说的第一条进行相关操作,再次表示感谢!