Create dataset loader for HPLTDatasets v1.2

Dataset	hplt
Description	The dataset is part of the High Performance Language Technologies project (HPLT), a 3-year EU-funded project started in September 2022. HPLT derives monolingual and bilingual datasets from the Internet Archive and CommonCrawl and builds efficient and solid machine translation (MT) as well as large language models (LLMs). HPLT aims at providing free, sustainable and reusable datasets, models and workflows at scale using high-performance computing (HPC).
Subsets	-
Languages	ind, zlm, tha, mya, fil, vie
Tasks	Language Modeling
License	Creative Commons Zero v1.0 Universal (cc0-1.0)
Homepage	https://hplt-project.org/datasets/v1.2
HF URL	https://huggingface.co/datasets/BramVanroy/hplt_monolingual_v1_2
Paper URL	https://aclanthology.org/2023.eamt-1.61/

SEACrowd / seacrowd-datahub