Open KagurazakaNyaa opened 1 week ago
adding revert pr to help isolate the problem:
adding revert pr to help isolate the problem:
2409
Reverting PR #2043 does not isolate this issue, because the issue existed before PR #2043. PR #2043 confirms that the issue is in the configuration outside the code repository.
Right - just created a branch without #2403 to check the latest successful ROCm image build version and to compare.
The same action workflow but without push image can be executed normally at https://github.com/KagurazakaNyaa/tabby/actions/runs/9496954836. This fork uses GitHub's default runner instead of the self-hosted runner. From the error message in this issue, it seems to be a problem with the action runner rather than the workflow. Is this repository using a GitHub-hosted runner or a self-hosted runner?
Also the ROCm version is kind of outdated with 5.7.1 although compatible with older cards, version 6.1.2 is out and has massive improvements in the newer cards, I don't know how much this can affect the model performance.
I tried compiling the 0.12.0 tag and I get this error with my registry and also tried local with this command
command: serve --model /data/models/rudiservo/StarCoder2-15b-Instruct-v0.1-Q8 --device rocm --no-webserver
tabby_1 | The application panicked (crashed).
tabby_1 | Message: Invalid model_id <TabbyML/Nomic-Embed-Text>
tabby_1 | Location: crates/tabby-common/src/registry.rs:108
tabby_1 |
tabby_1 | ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
tabby_1 | ⋮ 7 frames hidden ⋮
tabby_1 | 8: tabby_common::registry::ModelRegistry::get_model_info::h4cf4522936634953
tabby_1 | at <unknown source file>:<unknown line>
tabby_1 | 9: tabby_download::download_model::{{closure}}::h8da4574c84d31459
tabby_1 | at <unknown source file>:<unknown line>
tabby_1 | 10: tabby::services::model::download_model_if_needed::{{closure}}::h88e90df5ccbc9220
tabby_1 | at <unknown source file>:<unknown line>
tabby_1 | 11: tabby::serve::main::{{closure}}::h895907983720205f
tabby_1 | at <unknown source file>:<unknown line>
tabby_1 | 12: tokio::runtime::park::CachedParkThread::block_on::h69f0496402a974e5
tabby_1 | at <unknown source file>:<unknown line>
tabby_1 | 13: tabby::main::h244e2d137a039971
tabby_1 | at <unknown source file>:<unknown line>
tabby_1 | 14: std::sys_common::backtrace::__rust_begin_short_backtrace::h37fe2660d85af9e6
tabby_1 | at <unknown source file>:<unknown line>
tabby_1 | 15: std::rt::lang_start::{{closure}}::hfc465164803e6038
tabby_1 | at <unknown source file>:<unknown line>
tabby_1 | 16: std::rt::lang_start_internal::h3ed4fe7b2f419135
tabby_1 | at <unknown source file>:<unknown line>
tabby_1 | 17: main<unknown>
tabby_1 | at <unknown source file>:<unknown line>
tabby_1 | 18: __libc_start_call_main<unknown>
tabby_1 | at ./csu/../sysdeps/nptl/libc_start_call_main.h:58
tabby_1 | 19: __libc_start_main_impl<unknown>
tabby_1 | at ./csu/../csu/libc-start.c:392
tabby_1 | 20: _start<unknown>
tabby_1 | at <unknown source file>:<unknown line>
tabby_1 |
Describe the bug action: Create and publish docker image run failed
https://github.com/TabbyML/tabby/actions/runs/9506018585 release-docker (rocm) The hosted runner: GitHub Actions 15 lost communication with the server. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
Additional context In PR #2043, I attempted to update the version of action. In my fork, it can be built normally; however, after merging, it is still unable to build rocm docker images normally. It's recommended that check if a self-hosted Action Runner has been configured incorrectly.