SamuelCahyawijaya commented 8 months ago

Dataset	visobert
Description	The ViSoBERT is textual data crawled from three most well-known Vietnamese public social networks (Facebook, TikTok, and YouTube) by research API of these platform. The dataset contains Facebook posts, TikTok comments, and Youtube comments of Vietnamese-verified users, from Jan 2016 (Jan 2020 for TikTok) to Dec 2022. A post-processing mechanism is applied to handles hashtags, emojis, misspellings, hyperlinks, and other noncanonical texts.
Subsets	-
Languages	vie
Tasks	Language Modeling
License	Creative Commons Attribution Non Commercial 4.0 (cc-by-nc-4.0)
Homepage	https://drive.google.com/drive/folders/1C144LOlkbH78m0-JoMckpRXubV7XT7Kb
HF URL	https://huggingface.co/uitnlp/visobert
Paper URL	https://aclanthology.org/2023.emnlp-main.315.pdf

revaldianggara commented 8 months ago

self-assign

github-actions[bot] commented 8 months ago

Hi, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.

raileymontalan commented 7 months ago

SEACrowd / seacrowd-datahub