Closed davidpomerenke closed 2 months ago
Under the hood, functions are injected into the system message in a syntax the model has been trained on. This means functions count against the model's context limit and are billed as input tokens. If running into context limits, we suggest limiting the number of functions or the length of documentation you provide for function parameters.
It is also possible to use fine-tuning to reduce the number of tokens used if you have many functions defined. https://platform.openai.com/docs/guides/function-calling/tokens
53425b063d9afb93805ae00ae089ee3e06232ea3: There’s ~50 articles/day on climate change in Germany on MediaCloud, and ~5 articles/day on activism. We can run them all through GPT for sentiments and topics for < 50euro/year.
gpt-4o-mini, everything will become cheaper and better, don't worry, be happy
https://github.com/SocialChangeLab/media-impact-monitor/issues/51 requires us to run GPT over every protest-related article
https://github.com/SocialChangeLab/media-impact-monitor/issues/47 and https://github.com/SocialChangeLab/media-impact-monitor/issues/105 require us to run GPT over every climate-related article
We should get some cost estimates for this.
Scopes to consider:
The single most important source for this is MediaCloud.
We can retrieve some sample articles and run them through tiktoken to calculate the cost, and get the overall average number of articles per day via MediaCloud as well.