TabbyML / tabby

Self-hosted AI coding assistant
https://tabby.tabbyml.com/
Other
18.22k stars 762 forks source link

rocm docker github action build failed #2408

Open KagurazakaNyaa opened 1 week ago

KagurazakaNyaa commented 1 week ago

Describe the bug action: Create and publish docker image run failed

https://github.com/TabbyML/tabby/actions/runs/9506018585 release-docker (rocm) The hosted runner: GitHub Actions 15 lost communication with the server. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.

Additional context In PR #2043, I attempted to update the version of action. In my fork, it can be built normally; however, after merging, it is still unable to build rocm docker images normally. It's recommended that check if a self-hosted Action Runner has been configured incorrectly.

wsxiaoys commented 1 week ago

adding revert pr to help isolate the problem:

https://github.com/TabbyML/tabby/pull/2409

KagurazakaNyaa commented 1 week ago

adding revert pr to help isolate the problem:

2409

Reverting PR #2043 does not isolate this issue, because the issue existed before PR #2043. PR #2043 confirms that the issue is in the configuration outside the code repository.

wsxiaoys commented 1 week ago

Right - just created a branch without #2403 to check the latest successful ROCm image build version and to compare.

KagurazakaNyaa commented 1 week ago

The same action workflow but without push image can be executed normally at https://github.com/KagurazakaNyaa/tabby/actions/runs/9496954836. This fork uses GitHub's default runner instead of the self-hosted runner. From the error message in this issue, it seems to be a problem with the action runner rather than the workflow. Is this repository using a GitHub-hosted runner or a self-hosted runner?

rudiservo commented 1 week ago

Also the ROCm version is kind of outdated with 5.7.1 although compatible with older cards, version 6.1.2 is out and has massive improvements in the newer cards, I don't know how much this can affect the model performance.

I tried compiling the 0.12.0 tag and I get this error with my registry and also tried local with this command command: serve --model /data/models/rudiservo/StarCoder2-15b-Instruct-v0.1-Q8 --device rocm --no-webserver

tabby_1  | The application panicked (crashed).
tabby_1  | Message:  Invalid model_id <TabbyML/Nomic-Embed-Text>
tabby_1  | Location: crates/tabby-common/src/registry.rs:108
tabby_1  | 
tabby_1  |   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
tabby_1  |                                 ⋮ 7 frames hidden ⋮                               
tabby_1  |    8: tabby_common::registry::ModelRegistry::get_model_info::h4cf4522936634953
tabby_1  |       at <unknown source file>:<unknown line>
tabby_1  |    9: tabby_download::download_model::{{closure}}::h8da4574c84d31459
tabby_1  |       at <unknown source file>:<unknown line>
tabby_1  |   10: tabby::services::model::download_model_if_needed::{{closure}}::h88e90df5ccbc9220
tabby_1  |       at <unknown source file>:<unknown line>
tabby_1  |   11: tabby::serve::main::{{closure}}::h895907983720205f
tabby_1  |       at <unknown source file>:<unknown line>
tabby_1  |   12: tokio::runtime::park::CachedParkThread::block_on::h69f0496402a974e5
tabby_1  |       at <unknown source file>:<unknown line>
tabby_1  |   13: tabby::main::h244e2d137a039971
tabby_1  |       at <unknown source file>:<unknown line>
tabby_1  |   14: std::sys_common::backtrace::__rust_begin_short_backtrace::h37fe2660d85af9e6
tabby_1  |       at <unknown source file>:<unknown line>
tabby_1  |   15: std::rt::lang_start::{{closure}}::hfc465164803e6038
tabby_1  |       at <unknown source file>:<unknown line>
tabby_1  |   16: std::rt::lang_start_internal::h3ed4fe7b2f419135
tabby_1  |       at <unknown source file>:<unknown line>
tabby_1  |   17: main<unknown>
tabby_1  |       at <unknown source file>:<unknown line>
tabby_1  |   18: __libc_start_call_main<unknown>
tabby_1  |       at ./csu/../sysdeps/nptl/libc_start_call_main.h:58
tabby_1  |   19: __libc_start_main_impl<unknown>
tabby_1  |       at ./csu/../csu/libc-start.c:392
tabby_1  |   20: _start<unknown>
tabby_1  |       at <unknown source file>:<unknown line>
tabby_1  |