NusaParagraph is a human-written paragraph datasets which covers 10 local languages in Indonesia. The dataset consists of around 50,000 paragraphs each with aroung 100 tokens resulting in a total of 6M tokens. The dataset is labelled with topic, emotion, and paragraph type.
NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?nusa_paragraph_topic