KopI(Korpus Perayapan Indonesia)-NLLB, is Indonesian family language(aceh,bali,banjar,indonesia,jawa,minang,sunda) only extracted from NLLB Dataset each language set also filtered using some deduplicate technique such as exact hash(md5) dedup technique and minhash LSH neardup
NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?kopi_nllb