Xilinx / inference-server

https://xilinx.github.io/inference-server/
Apache License 2.0
43 stars 13 forks source link

Upgrade to ROCm 5.4.1 #98

Closed bpickrel closed 1 year ago

bpickrel commented 1 year ago

Reminder to raise the version number of ROCm (the package that includes Migraphx) in the docker generator to v 5.4.1. The testing release of v5.4.1 was made public in the "hidden" repository on Dec 8. I've verified it with the 3 Migraphx Python examples but the test script amdinfer test failed with message: ERROR tests/workers/test_migraphx.py::TestMigraphx::test_migraphx[1] - amdinfer._amdinfer.BadStatus. I noted one bug in all three examples: inability to parse the batch-size argument from the command line.

The 5.4.1 release contains a fix to a bug that was found in the 5.4.0 release, one which seriously hampers Migraphx.

Changes for the version change and batch bug fix are as follows:

+migraphx_apt_repo = 'echo "deb [arch=amd64 trusted=yes] https://repo.radeon.com/rocm/apt/.apt_5.4.1/ ubuntu main" > /etc/apt/sources.list.d/rocm.list' +migraphx_yum_repo = '"[ROCm]\\nname=ROCm\\nbaseurl=https://repo.radeon.com/rocm/yum/.yum_5.4.1/main/\\nenabled=1\\ngpgcheck=1\\ngpgkey=https://repo.radeon.com/rocm/rocm.gpg.key" > /etc/yum.repos.d/rocm.repo'

Add the following in all of examples/bert/bert.py, examples/resnet50/resnet.py and examples/yolo/yolo.py:

parser.add_argument( "--batch-size", default=10, type=int, <== add help="Batch size to use for the MIGraphX worker on the server", )

Also, the production release of version 5.4.1 is expected within a few days, which will supersede this version/URL. Note that the repository URL for the production release will not contain the .apt_ prefix to the version number.

varunsh-xilinx commented 1 year ago

The 5.4.1 link is already public: https://repo.radeon.com/rocm/apt/5.4.1/. I've made these changes in the bump-to-rocm54 branch. Can you please confirm this branch works on your end?

bpickrel commented 1 year ago

[AMD Official Use Only - General]

Did you push it? Github says my fork branch is up to date with the other URL.

From: varunsh @.> Sent: Tuesday, December 13, 2022 9:50 AM To: Xilinx/inference-server @.> Cc: Pickrell, Brian @.>; Author @.> Subject: Re: [Xilinx/inference-server] Upgrade to ROCm 5.4.1 (Issue #98)

Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.

The 5.4.1 link is already public: https://repo.radeon.com/rocm/apt/5.4.1/https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepo.radeon.com%2Frocm%2Fapt%2F5.4.1%2F&data=05%7C01%7CBrian.Pickrell%40amd.com%7C02121ac587944a6b58c808dadd327a7e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638065506129887694%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Z5StJ29FKIwiGKXAv1lOU3EuI2BxZltal7P700M9UmA%3D&reserved=0. I've made these changes in the bump-to-rocm54 branch. Can you please confirm this branch works on your end?

- Reply to this email directly, view it on GitHubhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FXilinx%2Finference-server%2Fissues%2F98%23issuecomment-1349277699&data=05%7C01%7CBrian.Pickrell%40amd.com%7C02121ac587944a6b58c808dadd327a7e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638065506129887694%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=dEtL4RTVHRLW2SHhHKBRb7dQ%2F6mIPDml4JtAkXTntnw%3D&reserved=0, or unsubscribehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAWWXKUW4HLZ2EIWNSG6X7XDWNCZNDANCNFSM6AAAAAAS4WCUUY&data=05%7C01%7CBrian.Pickrell%40amd.com%7C02121ac587944a6b58c808dadd327a7e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638065506129887694%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2F91fs0Vti75i9KVKzpTrCeh2QvYbmrlM8BM892%2FRELM%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>

varunsh-xilinx commented 1 year ago

Yes, the branch was updated 17 min ago: https://github.com/xilinx/inference-server/tree/bump-to-rocm54

bpickrel commented 1 year ago

Do we have any further instructions for installing git-lfs? I get an error when I run git lfs install on my fresh system, for reasons not clear to me. I tried it both within and outside of the docker container. The github lfs Web page is not very user friendly.

bpickrel commented 1 year ago

Looks like sudo apt-get install git-lfs still has to be run independent of amdinfer setup for this to work. Can it be automated?

varunsh-xilinx commented 1 year ago

git-lfs is installed in the dev container and you can use that if you'd like. If you want it on your host machine, then it's up to you to install it, whether from apt-get or from the latest release on Github. We're not installing anything on the host machine right now

bpickrel commented 1 year ago

It may be that it did install it, but gave a warning message containing the word "Error." Confusing.

bpickrel commented 1 year ago

I'm also finding that I had to run sudo chown amdinfer-user /home/amdinfer-user inside the Docker container before I could build the project. This occurred on my local tower but not on my team's MI200 server.