Change the collapsing and description scripts to split large sets of frame-clusters

cehrett / social_media_frame_analysis

Tools for extracting and analyzing frames/theories/narratives from social media posts.

2 stars 0 forks source link

Change the collapsing and description scripts to split large sets of frame-clusters #3

Closed cehrett closed 3 months ago

cehrett commented 3 months ago

Sometimes, there are so many frame-clusters present in a day that the collapsing and description scripts cannot fit them all in a single prompt. So, these scripts should detect when the prompt context window is too large, and if it is, they should perform the collapsing/description in multiple stages on sub-portions of the data.

cehrett commented 3 months ago

To support this, we need a script that can take a prompt and a model, and output the number of tokens that the prompt constitutes. (Then, using this script, we can throw a warning or exception if the size of the prompt is sufficiently high.)

Cooper-Taylor commented 3 months ago

Added functions to count tokens and partition messages (assuming input is markdown table); starts subprocessing at 63000 tokens, but can be changed if necessary. Throws runtime error if the prompt not being partitioned is 75% or more of the max tokens to avoid unnecessary API calls.