Open lirilkumar opened 4 months ago
same
Would be a cool feature but I haven't put much thought into how multi-modal APIs handle images in general. From my narrow understanding, OpenAI does things differently than other providers. Any suggestions how to approach this?
Would be a cool feature but I haven't put much thought into how multi-modal APIs handle images in general. From my narrow understanding, OpenAI does things differently than other providers. Any suggestions how to approach this?
Maybe this can help: https://www.oranlooney.com/post/gpt-cnn/
We need to understand the image tokenizer of gpt-4o then reverse-engineer to get the tokenizer. Was searching for literature but haven't found much sadly.
Currently, we do not have a capability to calculate cost for attached media (
image
) in context. Could be a great feature?Let me know if that is possible to calculate somehow in 0.1.11