Check how the API response time varies now between business and personal accounts

vrodriguezf commented 10 months ago

Do it for different models

OhhTuRnz commented 9 months ago

I'll be adding some context given our Slack convo:

First of all, few-shot prompting takes so much times, it even reached a range from 60 to 90 seconds response time. That happend because of the number of tokens we use everytime we send the history, it can be improved if we only give the history once.
Fine-tuning models seems quite good. Prompts were as good as few-shot prompting but in a more reasonable time. You can have your response in <1 seconds, one of the biggest problems i could think of was that when the target was pretty close, the delay between prompts needs to be shortened due to a bigger need for more dynamic responses.

vrodriguezf commented 9 months ago

My thought right now is that, in the approach of replicating human behaviour, RAG (customGPTs) or prompt engineering themselves, without fine tuning, are not going to get us anywhere here because of the latency, especially RAG.

In other paths different than human behaviour cloning, such as the one that @DumplingLife is purusing of trying to forecast the future state of each object and make a plan based on that, these approaches could be more useful, despite being slower.

Anyway, as they say in this (fantastic talk)[https://www.youtube.com/watch?v=ahnGLM-RC1Y&t=1429s] by OpenAI to optimize LLMs, the best approach to explore LLM optimal usage is not exclusively one path (RAG, prompt engineering, fine tuning), but it can include multiple aspects.

vrodriguezf commented 9 months ago

So the question, related to this issue is: Are fine-tuned models with the arclab business account slower than fine-tuned models in a personal account? This is extremely relevant

ARCLab-MIT / kspdg

Check how the API response time varies now between business and personal accounts #23