Create dataset loader for Word frequency distribution

IndoNLP / nusa-crowd

A collaborative project to collect datasets in Indonesian languages.

Apache License 2.0

260 stars 60 forks source link

Open SamuelCahyawijaya opened 1 year ago

SamuelCahyawijaya commented 1 year ago

Dataset	freq_dist_id
Description	Word frequency lists compiled from four different sources: Kompas, Wikipedia, Twitter, and Kaskus. Top 10,000 most frequent words per source, along with statistical distribution (Zipf graph).
License	Unknown