GPT cost estimation - Githubissues

davidpomerenke commented 4 months ago

https://github.com/SocialChangeLab/media-impact-monitor/issues/51 requires us to run GPT over every protest-related article

https://github.com/SocialChangeLab/media-impact-monitor/issues/47 and https://github.com/SocialChangeLab/media-impact-monitor/issues/105 require us to run GPT over every climate-related article

We should get some cost estimates for this.

Scopes to consider:

Climate protests in Germany
Climate in Germany
All protests in Germany
Climate protests worldwide
Climate worldwide
All protests worldwide
Also other topics like animal rights

The single most important source for this is MediaCloud.

We can retrieve some sample articles and run them through tiktoken to calculate the cost, and get the overall average number of articles per day via MediaCloud as well.

linear[bot] commented 4 months ago

MIM-37 GPT cost estimation

davidpomerenke commented 4 months ago

Under the hood, functions are injected into the system message in a syntax the model has been trained on. This means functions count against the model's context limit and are billed as input tokens. If running into context limits, we suggest limiting the number of functions or the length of documentation you provide for function parameters.

It is also possible to use fine-tuning to reduce the number of tokens used if you have many functions defined. https://platform.openai.com/docs/guides/function-calling/tokens

davidpomerenke commented 4 months ago

53425b063d9afb93805ae00ae089ee3e06232ea3: There’s ~50 articles/day on climate change in Germany on MediaCloud, and ~5 articles/day on activism. We can run them all through GPT for sentiments and topics for < 50euro/year.

davidpomerenke commented 2 months ago

gpt-4o-mini, everything will become cheaper and better, don't worry, be happy

SocialChangeLab / media-impact-monitor

GPT cost estimation #107