Codium-ai / pr-agent

🚀CodiumAI PR-Agent: An AI-Powered 🤖 Tool for Automated Pull Request Analysis, Feedback, Suggestions and More! 💻🔍
Apache License 2.0
6.1k stars 591 forks source link

[Feature Request] Allow users to connect pr_agent to an existing SageMaker inference endpoint #609

Open mattiaciollaro opened 10 months ago

mattiaciollaro commented 10 months ago

Context: assume a user has e.g. a pre-configured LLM inference endpoint in SageMaker (for example, a self-hosted Llama model as described here). It would be nice to be able to allow the user to configure pr-agent to leverage that endpoint e.g. by means of a dedicated AI handler.

Discord chat: https://discord.com/channels/1057273017547378788/1057273018084237344/1197261978591309884

cc: @krrishdholakia

mrT23 commented 10 months ago

@krrishdholakia do you think this request is feasible ?

krrishdholakia commented 10 months ago

Hey @mattiaciollaro @mrT23 we already support sagemaker - https://docs.litellm.ai/docs/providers/aws_sagemaker

What am i missing?

mrT23 commented 9 months ago

@mattiaciollaro is this PR still relevant?

mattiaciollaro commented 9 months ago

Sorry for the delay guys.

Hey @mattiaciollaro @mrT23 we already support sagemaker - https://docs.litellm.ai/docs/providers/aws_sagemaker

https://docs.litellm.ai/docs/providers/aws_sagemaker seems to support SageMaker JumpStart models specifically.

I am thinking of a different situation where a model is already deployed via SageMaker and a reference to the inference endpoint name is available (as in here). In that case, how can we instruct pr-agent to leverage the LLM behind that pre-existing endpoint? I am not sure I see a way of doing this via https://docs.litellm.ai/docs/providers/aws_sagemaker

In the context of a POC with my team, the way we accomplished this was to hack the pr-agent's default AI handler (which is the LiteLLM AI handler) and use the sagemaker SDK (specifically, the HF predictor to make requests to the pre-existing SageMaker endpoint.

I imagine a cleaner solution would be to implement a dedicated AI handler for this usecase?

@mattiaciollaro is this PR still relevant?

I don't have a PR out for this, but yes: I think the feature request is still relevant :) My apologies again for the delay!