issues
search
EleutherAI
/
pilev2
MIT License
13
stars
9
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Product listing and review datasets
#26
upintheairsheep
opened
1 year ago
0
#youtubearchive
#25
upintheairsheep
opened
1 year ago
0
WikiTeam Wikidumps
#24
upintheairsheep
opened
1 year ago
0
Discord dumps
#23
upintheairsheep
opened
1 year ago
10
More IRC logs
#22
upintheairsheep
opened
1 year ago
1
Filmot (low quality automatic subtitles)
#21
upintheairsheep
opened
1 year ago
0
Mobile App Store Pages
#20
upintheairsheep
opened
1 year ago
1
YouTube Statistics
#19
upintheairsheep
opened
1 year ago
0
TikTok
#18
upintheairsheep
opened
1 year ago
0
Instagram
#17
upintheairsheep
opened
1 year ago
0
Multiple Twitter datasets
#16
upintheairsheep
opened
1 year ago
1
Hector's datasets
#15
upintheairsheep
opened
1 year ago
0
ConvoKit Datasets
#14
upintheairsheep
opened
1 year ago
0
Internet Archive
#13
upintheairsheep
opened
1 year ago
0
Fandom
#12
upintheairsheep
opened
1 year ago
3
Documents scraped from Open Directories
#11
upintheairsheep
opened
1 year ago
0
ReasearchGate
#10
upintheairsheep
opened
1 year ago
0
ScienceDirect
#9
upintheairsheep
opened
1 year ago
0
RoyalRoad
#8
upintheairsheep
opened
1 year ago
0
Expand Common Crawl dataset to include more timespans, probably even the entire history of crawls
#7
upintheairsheep
opened
1 year ago
0
Origin/wip dedup
#6
taisazero
closed
1 year ago
0
Add datasheets for all datasources
#5
ncoop57
closed
1 year ago
0
Pile V2?
#4
StellaAthena
closed
1 year ago
2
Dataset Proposals?
#3
fattorib
closed
1 year ago
0
arXiv
#2
StellaAthena
closed
1 year ago
0
Stack Exchange
#1
StellaAthena
closed
1 year ago
0