SocialChangeLab / media-impact-monitor

The Media Impact Monitor will be a novel tool for protest groups and NGOs to measure and visualize their impact on public discourse.
https://mediaimpactmonitor.app
Other
36 stars 0 forks source link

GPT cost estimation #107

Closed davidpomerenke closed 2 months ago

davidpomerenke commented 4 months ago

https://github.com/SocialChangeLab/media-impact-monitor/issues/51 requires us to run GPT over every protest-related article

https://github.com/SocialChangeLab/media-impact-monitor/issues/47 and https://github.com/SocialChangeLab/media-impact-monitor/issues/105 require us to run GPT over every climate-related article

We should get some cost estimates for this.

Scopes to consider:

The single most important source for this is MediaCloud.

We can retrieve some sample articles and run them through tiktoken to calculate the cost, and get the overall average number of articles per day via MediaCloud as well.

linear[bot] commented 4 months ago

MIM-37 GPT cost estimation

davidpomerenke commented 4 months ago

Under the hood, functions are injected into the system message in a syntax the model has been trained on. This means functions count against the model's context limit and are billed as input tokens. If running into context limits, we suggest limiting the number of functions or the length of documentation you provide for function parameters.

It is also possible to use fine-tuning to reduce the number of tokens used if you have many functions defined. https://platform.openai.com/docs/guides/function-calling/tokens

davidpomerenke commented 4 months ago

53425b063d9afb93805ae00ae089ee3e06232ea3: There’s ~50 articles/day on climate change in Germany on MediaCloud, and ~5 articles/day on activism. We can run them all through GPT for sentiments and topics for < 50euro/year.

davidpomerenke commented 2 months ago

gpt-4o-mini, everything will become cheaper and better, don't worry, be happy