TeaSitta / AI-Horde-Worker

Turn your local KoboldAI compatible API into an AI Horde worker
GNU Affero General Public License v3.0
0 stars 0 forks source link

Add total context window token limit in addition to thread limit #33

Open TeaSitta opened 6 months ago

TeaSitta commented 6 months ago

Workers should use a running tally of job token size (estimate 3-4characters per token), queue new job that would put over configured limit and stop popping until the queued job has enough room to submit job to backend.

Allows user to specify the approximate total kv-cache size of the aphrodite instance so as few jobs as possible are popped from the horde and queued in aphrodite.

-3 or 4 characters per token? -special character handling? -UI display(s) / log messages? -Do you need to take requested generation size into account for the total window?