Judtoff commented 6 months ago

I apologize if this causes any issues, I have never used git before, there likely are a dozen best practices I've missed. I've added a pair of flags to glados_config.py, one for Split Mode Row and one for Flash Attention. I've tested these on 3x NVIDIA P40s to confirm it is working as expected. I updated llama.py to take these two flags and pass it to the server command. I used the same format as the context length that gets pushed to the server command. Using these flags will result in a speedup with multiple GPUs. llama.cpp has documentation on these flags.

Summary by CodeRabbit

New Features
- Added enable_split_mode and enable_flash_attn settings to the configuration, allowing more customization of the server behavior.
Configuration Updates
- Updated interruptible setting for Glados to false.
- Updated model_path for LlamaServer to "./models/Meta-Llama-3-70B-Instruct-Q5_K_M_.gguf".

coderabbitai[bot] commented 6 months ago

[!IMPORTANT]

Review skipped

Review was skipped as selected files did not have any reviewable changes.

Files selected but had no reviewable changes (1)
* glados_config.yml

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The LlamaServerConfig class in llama.py has been enhanced with two new boolean attributes, enable_split_mode and enable_flash_attn, allowing for more customizable server configurations. Additionally, the glados_config.yml file has been updated to reflect these changes and include new settings for LlamaServer. This update also changes the interruptible setting for Glados and updates the model path.

Changes

File	Change Summary
`glados/llama.py`	Added `enable_split_mode` and `enable_flash_attn` attributes to `LlamaServerConfig` class.
`glados_config.yml`	Updated `interruptible` setting for `Glados`, changed `model_path`, and added new server settings.

🐰✨ In code's realm, new features bloom, Split modes and flash attention loom. Configs updated, paths aligned, A smoother server, finely designed. With every change, a brighter tune, The Llama dances to the moon. 🌕🎶

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

- [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai) - [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai) - [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai) - [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)

Tips

### Chat There are 3 ways to chat with [CodeRabbit](https://coderabbit.ai): - Review comments: Directly reply to a review comment made by CodeRabbit. Example: - `I pushed a fix in commit .` - `Generate unit testing code for this file.` - `Open a follow-up GitHub issue for this discussion.` - Files and specific lines of code (under the "Files changed" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples: - `@coderabbitai generate unit testing code for this file.` - `@coderabbitai modularize this function.` - PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples: - `@coderabbitai generate interesting stats about this repository and render them as a table.` - `@coderabbitai show all the console.log statements in this repository.` - `@coderabbitai read src/utils.ts and generate unit testing code.` - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.` - `@coderabbitai help me debug CodeRabbit configuration file.` Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. ### CodeRabbit Commands (invoked as PR comments) - `@coderabbitai pause` to pause the reviews on a PR. - `@coderabbitai resume` to resume the paused reviews. - `@coderabbitai review` to trigger an incremental review. This is useful when automatic reviews are disabled for the repository. - `@coderabbitai full review` to do a full review from scratch and review all the files again. - `@coderabbitai summary` to regenerate the summary of the PR. - `@coderabbitai resolve` resolve all the CodeRabbit review comments. - `@coderabbitai configuration` to show the current CodeRabbit configuration for the repository. - `@coderabbitai help` to get help. Additionally, you can add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed. ### CodeRabbit Configration File (`.coderabbit.yaml`) - You can programmatically configure CodeRabbit by adding a `.coderabbit.yaml` file to the root of your repository. - Please see the [configuration documentation](https://docs.coderabbit.ai/guides/configure-coderabbit) for more information. - If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: `# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json` ### Documentation and Community - Visit our [Documentation](https://coderabbit.ai/docs) for detailed information on how to use CodeRabbit. - Join our [Discord Community](https://discord.com/invite/GsXnASn26c) to get help, request features, and share feedback. - Follow us on [X/Twitter](https://twitter.com/coderabbitai) for updates and announcements.

Judtoff commented 5 months ago

If the flags for split-mode row and flash attention aren't included in glados_config.yml we should probably set them to False by default in llama.py. I ran into some issues when recompiling llama.cpp this afternoon, looks like at some point llama.cpp server got renamed llama-server. There is a good chance " enable_flash_attn: bool = True " in llama.py will cause issues for people. Changing it to "enable_flash_attn: bool = False" solves the issue (or if they get the right branch of llama.cpp where server supports flash attention then they won't have an issue.

Alternatively, include this in glados_config.yml

multiple GPU enhancements:

enable_flash_attn: false

Sorry for the confusion this has caused, I should not have set those booleans to True in llama.py Thanks, -Jud

dnhkng / GlaDOS

Adding support for GPU flags --Split-mode row #56

Summary by CodeRabbit

Review skipped

Walkthrough

Changes

multiple GPU enhancements: