Closed EachSheep closed 2 days ago
Additionally, I would like to inquire why more cost-effective models like GPT-3.5 or GPT-4o-mini were not used for data synthesis. What challenges did you encounter during this process?
Hi,
The actual cost of using EntiGraph and GPT-4 API to EntiGraph the data. The actual cost of using GPT-4 and GPT-3.5 for evaluation. The model used for Rephrase and its actual cost. An exact cost is a bit difficult to estimate because our API is shared with many users among different project. A rough estimate of the cost for everything is that it's about $60K. At the time when the majority of the synthetic data generation are done, GPT-4o and batch API doesn't exist yet, with latest infra, we expect those models to reduce cost considerably.
Additionally, I would like to inquire why more cost-effective models like GPT-3.5 or GPT-4o-mini were not used for data synthesis. What challenges did you encounter during this process?
With the latest infra, I believe GPT-4o-mini with batch API is the best setup.
Hi,
The actual cost of using EntiGraph and GPT-4 API to EntiGraph the data. The actual cost of using GPT-4 and GPT-3.5 for evaluation. The model used for Rephrase and its actual cost. An exact cost is a bit difficult to estimate because our API is shared with many users among different project. A rough estimate of the cost for everything is that it's about $60K. At the time when the majority of the synthetic data generation are done, GPT-4o and batch API doesn't exist yet, with latest infra, we expect those models to reduce cost considerably.
Additionally, I would like to inquire why more cost-effective models like GPT-3.5 were not used for data synthesis. What challenges did you encounter during this process?
With the latest infra, I believe GPT-4o-mini with batch API is the best setup.
Thank you for your response! I have indeed noticed the recent introduction of GPT-4o, the batch API, and prompt Cache, which can significantly reduce costs by several times.
However, I’m curious why you didn’t choose models like GPT-3.5-turbo, which could reduce costs by tens of times (though it has since been deprecated by OpenAI), which could drastically lower expenses. Was there any specific reasoning behind your decision?
I believe that deploying smaller, continuously pre-trained language models in broader applications is crucial, and cost plays a key role in this. Only when the cost is sufficiently reduced can this technology benefit a larger audience, which is why I’m particularly focused on this aspect.
Thanks for your reply again!
Hi,
In the early days of our exploration, I experimented GPT-3.5 vs. GPT-4 ablation with the rephrase baseline (blue curve in figure 2 from https://arxiv.org/pdf/2409.07431). And GPT-4 performs better even when GPT-3.5 has 5X token. We decided to iterate with GPT-4 following that observation. That said, I do think that with the EntiGraph prompts, a weaker model could also do the work. In short, we unfortunately don't have 3.5 vs. 4 ablation with EntiGraph prompts.
Hi,
In the early days of our exploration, I experimented GPT-3.5 vs. GPT-4 ablation with the rephrase baseline (blue curve in figure 2 from https://arxiv.org/pdf/2409.07431). And GPT-4 performs better even when GPT-3.5 has 5X token. We decided to iterate with GPT-4 following that observation. That said, I do think that with the EntiGraph prompts, a weaker model could also do the work. In short, we unfortunately don't have 3.5 vs. 4 ablation with EntiGraph prompts.
Thank you for your detailed and patient response!
Thank you for your excellent work. This research involves extensive entity relationship extraction on the QuALITY dataset using GPT-4, and the required token count far exceeds the 455M tokens mentioned in the paper. I would like to understand the following:
I hope to learn more about the above so I can better follow your excellent work in the future.