NusaAlinea is a human-written paragraph-level datasets which covers 10 local languages in Indonesia. The dataset consists of around 50,000 paragraphs each with around 100 tokens resulting in a total of 6M tokens. The dataset is labelled with topic, emotion, and paragraph type.
NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?nusa_alinea