sariola commented 1 month ago

Flow Judge Baseten Model

Summary

This PR introduces FlowJudge Model instantiation using Baseten. It also deploys the model on Baseten on first use.

Key Changes

Created aBaseten class in the models folder:
- Encapsulates the 'engine' functionality to initialize and run the model
- Attaches to the Baseten API adapters
- Refactored according to the changes in main branch
Created Baseten Adapters:
- Handling of Webhook requests using /async_predict route from Baseten
- Sync requests using the openai standard
Baseten Deployment:
- Authentication with Baseten and setting of API key
- Model deployment using Truss for the defined model config in model/adapters/baseten/config
- Production deployment set as the default.

Testing

Manually tested:

On MacOS Apple Silicon
Verified functionality of Baseten model classes with BasetenAPIAdapter, and AsyncBasetenAPIAdapter
Verified functionality of deploy.py -> ensure_model_deployment()
Tested with local examples for single and batched request
Tested integration in example notebook, ensuring correct behaviour

Breaking changes

Pyproject.toml
- Introduced optional dependency for baseten: baseten = ["truss>=0.9.42"]

sariola commented 1 month ago

What is the reason behind these errors?

@alexwegrzyn Failed to send async predict results for request 35643965633151487de2db4ef135dd15 to webhook endpoint https://proxy.flowrite.com//webhook. Status code: 301, response:

Does the double slash play a role? It's error behavior from baseten logs from last night and from today.

@minaamshahid vLLM has gone into an unhealthy state due to error: , restarting service now... To get more information on this edit the model.py engine flowaicom/baseten/blob/main/model/helper.py and re-deploy.

ghost commented 1 month ago

What is the reason behind these errors?

@alexwegrzyn Failed to send async predict results for request 35643965633151487de2db4ef135dd15 to webhook endpoint https://proxy.flowrite.com//webhook. Status code: 301, response:

Does the double slash play a role? It's error behavior from baseten logs from last night and from today.

Yes, its because of the double slash:

# curl https://proxy.flowrite.com//webhook
<a href="/webhook">Moved Permanently</a>.

The double slash was probably introduced in the client in webhook_url parameter, see here. The self.webhook_proxy_url probably already has trailing slash and another one is added with +"/webhook" and not normalized before being shipped to Baseten.

sariola commented 1 month ago

The double slash was probably introduced in the client in webhook_url parameter, see here. The self.webhook_proxy_url probably already has trailing slash and another one is added with +"/webhook" and not normalized before being shipped to Baseten.

Got it, looks like might be due to manually inserting the inputs, is it? @minaamshahid

ghost commented 1 month ago

Got it, looks like might be due to manually inserting the inputs, is it? @minaamshahid

I was able to confirm earlier with Minaam that this is indeed the case. The tests were using the url with trailing slash and they are already fixed.

minaamshahid commented 1 month ago

The double slash was probably introduced in the client in webhook_url parameter, see here. The self.webhook_proxy_url probably already has trailing slash and another one is added with +"/webhook" and not normalized before being shipped to Baseten.

Got it, looks like might be due to manually inserting the inputs, is it? @minaamshahid

Yes!

minaamshahid commented 1 month ago

@sariola I've pushed an update with the suggested changes (aiohttp, openai client). There is a change to the pyproject.toml for the 'dev' deps: pytest-asyncio>=0.23.6, <0.24.0 from >0.24.0 This was a conflicting one with truss

@sariola There is a conflicting dependency with pytest-asyncio when downloading optional dependencies for 'dev' and 'baseten' flow-judge[baseten,dev,hf,llamafile,vllm] 0.1.0 depends on pytest-asyncio>=0.24.0; extra == "dev" truss 0.9.43 depends on pytest-asyncio<0.24.0 and >=0.23.6 Anything against locking version to <0.24.0 and >=0.23.6 in the repo? From the changelog at least, I don't see changes that would affect our usage in the repo.

sariola commented 1 month ago

@sariola I've pushed an update with the suggested changes (aiohttp, openai client). There is a change to the pyproject.toml for the 'dev' deps: pytest-asyncio>=0.23.6, <0.24.0 from >0.24.0 This was a conflicting one with truss

@sariola There is a conflicting dependency with pytest-asyncio when downloading optional dependencies for 'dev' and 'baseten' flow-judge[baseten,dev,hf,llamafile,vllm] 0.1.0 depends on pytest-asyncio>=0.24.0; extra == "dev" truss 0.9.43 depends on pytest-asyncio<0.24.0 and >=0.23.6 Anything against locking version to <0.24.0 and >=0.23.6 in the repo? From the changelog at least, I don't see changes that would affect our usage in the repo.

Great! Thank you M.

Looks neat! I'll test it e2e tonight with a new account :muscle:

PS Yeah the clash doesn't seem significant, good call.

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 56.21118% with 141 lines in your changes missing coverage. Please review.

:white_check_mark: All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
flow_judge/models/baseten.py	0.00%	76 Missing :warning:
tests/unit/models/test_baseten.py	90.04%	20 Missing :warning:
flow_judge/models/common.py	0.00%	19 Missing :warning:
flow_judge/flow_judge.py	0.00%	16 Missing :warning:
flow_judge/__init__.py	0.00%	7 Missing :warning:
flow_judge/metrics/presets.py	0.00%	2 Missing :warning:
flow_judge/metrics/__init__.py	0.00%	1 Missing :warning:

Files with missing lines	Coverage Δ
flow_judge/metrics/metric.py	`0.00% <ø> (ø)`
...sts/e2e-local/integrations/test_llama_index_e2e.py	`91.66% <ø> (ø)`
tests/e2e-local/models/test_llamafile_e2e.py	`86.86% <ø> (ø)`
tests/unit/models/test_llamafile_unit.py	`100.00% <ø> (ø)`
tests/unit/test_flow_judge.py	`98.09% <ø> (ø)`
tests/unit/test_metrics.py	`100.00% <ø> (ø)`
tests/unit/test_utils.py	`100.00% <ø> (ø)`
flow_judge/metrics/__init__.py	`0.00% <0.00%> (ø)`
flow_judge/metrics/presets.py	`0.00% <0.00%> (ø)`
flow_judge/__init__.py	`0.00% <0.00%> (ø)`
... and 4 more

... and 1 file with indirect coverage changes

flowaicom / flow-judge

Feat/baseten integration #18

Flow Judge Baseten Model

Summary

Key Changes

Testing

Breaking changes

Codecov Report