The dataset compiles information from seven prominent Indonesian news platforms: Tempo, CNN Indonesia, CNBC Indonesia, Okezone, Suara, Kumparan, and JawaPos. Each source contributes a diverse range of articles, collectively forming a comprehensive repository of Indonesian news content. This dataset includes 2 special columns, 'embedding' which houses the text embeddings extracted using the OpenAI text-embedding-ada-002 model, and 'summary' which encapsulates the concise article summary generated via the ChatGPT API.
NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?id_news_dataset