Create dataset loader for IJELID (Indonesian-Javanese-English Code-Mixed Language Identification)

NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?ijelid

Dataset	ijelid
Description	This is a clean version of code-mixed Indonesian-Javanese-English data for token level language identification. We name this dataset as IJELID (Indonesian-Javanese-English Language Identification). This dataset contains tweets that have been tokenized with the corresponding token and its language label. There are seven language labels in the dataset, namely: ID (Indonesian), JV (Javanese), EN (English), MIX_ID_EN (mixed Indonesian-English), MIX_ID_JV (mixed Indonesian-Javanese), MIX_JV_EN (mixed Javanese-English), OTH (Other).
License	CC-BY 4.0

Dataset

ijelid

Description

This is a clean version of code-mixed Indonesian-Javanese-English data for token level language identification. We name this dataset as IJELID (Indonesian-Javanese-English Language Identification). This dataset contains tweets that have been tokenized with the corresponding token and its language label. There are seven language labels in the dataset, namely: ID (Indonesian), JV (Javanese), EN (English), MIX_ID_EN (mixed Indonesian-English), MIX_ID_JV (mixed Indonesian-Javanese), MIX_JV_EN (mixed Javanese-English), OTH (Other).

License

CC-BY 4.0

IndoNLP / nusa-crowd

Create dataset loader for IJELID (Indonesian-Javanese-English Code-Mixed Language Identification) #345