ErikBjare / gptme

Your agent in your terminal, equipped with local tools: writes code, uses the terminal, browses the web, vision.
https://gptme.org/docs/
MIT License
2.64k stars 178 forks source link

PDF support #242

Open ErikBjare opened 3 weeks ago

ErikBjare commented 3 weeks ago

I've been working on reading them with the browser tool here: https://github.com/ErikBjare/gptme/pull/196

Anthropic just announced PDF support: https://x.com/alexalbert__/status/1852394000101323193

@simonw added it to llm here: https://x.com/simonw/status/1852423593084498409

The implementation I had in mind is model-agnostic, but doesn't include the vision component (probably good enough in 80% of cases), but Anthropics solution seems to be fully multimodal.