ZitongYang / Synthetic_Continued_Pretraining

Code implementation of synthetic continued pretraining
https://arxiv.org/abs/2409.07431
Apache License 2.0
50 stars 4 forks source link

Inquiry about the API costs to reproduce this work: #4

Closed EachSheep closed 2 days ago

EachSheep commented 2 weeks ago

Thank you for your excellent work. This research involves extensive entity relationship extraction on the QuALITY dataset using GPT-4, and the required token count far exceeds the 455M tokens mentioned in the paper. I would like to understand the following:

  1. The actual cost of using EntiGraph and GPT-4 API to EntiGraph the data.
  2. The actual cost of using GPT-4 and GPT-3.5 for evaluation.
  3. The model used for Rephrase and its actual cost.

I hope to learn more about the above so I can better follow your excellent work in the future.

EachSheep commented 2 weeks ago

Additionally, I would like to inquire why more cost-effective models like GPT-3.5 or GPT-4o-mini were not used for data synthesis. What challenges did you encounter during this process?

ZitongYang commented 1 week ago

Hi,

The actual cost of using EntiGraph and GPT-4 API to EntiGraph the data. The actual cost of using GPT-4 and GPT-3.5 for evaluation. The model used for Rephrase and its actual cost. An exact cost is a bit difficult to estimate because our API is shared with many users among different project. A rough estimate of the cost for everything is that it's about $60K. At the time when the majority of the synthetic data generation are done, GPT-4o and batch API doesn't exist yet, with latest infra, we expect those models to reduce cost considerably.

Additionally, I would like to inquire why more cost-effective models like GPT-3.5 or GPT-4o-mini were not used for data synthesis. What challenges did you encounter during this process?

With the latest infra, I believe GPT-4o-mini with batch API is the best setup.

EachSheep commented 1 week ago

Hi,

The actual cost of using EntiGraph and GPT-4 API to EntiGraph the data. The actual cost of using GPT-4 and GPT-3.5 for evaluation. The model used for Rephrase and its actual cost. An exact cost is a bit difficult to estimate because our API is shared with many users among different project. A rough estimate of the cost for everything is that it's about $60K. At the time when the majority of the synthetic data generation are done, GPT-4o and batch API doesn't exist yet, with latest infra, we expect those models to reduce cost considerably.

Additionally, I would like to inquire why more cost-effective models like GPT-3.5 were not used for data synthesis. What challenges did you encounter during this process?

With the latest infra, I believe GPT-4o-mini with batch API is the best setup.

Thank you for your response! I have indeed noticed the recent introduction of GPT-4o, the batch API, and prompt Cache, which can significantly reduce costs by several times.

However, I’m curious why you didn’t choose models like GPT-3.5-turbo, which could reduce costs by tens of times (though it has since been deprecated by OpenAI), which could drastically lower expenses. Was there any specific reasoning behind your decision?

I believe that deploying smaller, continuously pre-trained language models in broader applications is crucial, and cost plays a key role in this. Only when the cost is sufficiently reduced can this technology benefit a larger audience, which is why I’m particularly focused on this aspect.

Thanks for your reply again!

ZitongYang commented 3 days ago

Hi,

In the early days of our exploration, I experimented GPT-3.5 vs. GPT-4 ablation with the rephrase baseline (blue curve in figure 2 from https://arxiv.org/pdf/2409.07431). And GPT-4 performs better even when GPT-3.5 has 5X token. We decided to iterate with GPT-4 following that observation. That said, I do think that with the EntiGraph prompts, a weaker model could also do the work. In short, we unfortunately don't have 3.5 vs. 4 ablation with EntiGraph prompts.

EachSheep commented 2 days ago

Hi,

In the early days of our exploration, I experimented GPT-3.5 vs. GPT-4 ablation with the rephrase baseline (blue curve in figure 2 from https://arxiv.org/pdf/2409.07431). And GPT-4 performs better even when GPT-3.5 has 5X token. We decided to iterate with GPT-4 following that observation. That said, I do think that with the EntiGraph prompts, a weaker model could also do the work. In short, we unfortunately don't have 3.5 vs. 4 ablation with EntiGraph prompts.

Thank you for your detailed and patient response!