issues
search
EleutherAI
/
the-pile
MIT License
1.51k
stars
128
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Could you possibly share the 825GB pile data temporarily and unofficially?
#120
dsdanielpark
opened
9 months ago
2
Question regarding Shuffling
#119
LeoXinhaoLee
opened
10 months ago
1
Issue reproducing the GitHub partition
#118
osainz59
opened
1 year ago
3
Link in Readme produces 404
#117
gladwig2
opened
1 year ago
15
When accessing https://the-eye.eu/public/AI/pile_preliminary_components/, a 404 error occurs
#116
s1ghhh
opened
1 year ago
0
book3 metadata
#115
DachengLi1
opened
1 year ago
0
Mismatched data size Problem
#114
jaywaer
closed
1 year ago
0
link for book3
#113
wang99711123
opened
1 year ago
2
Any search tools?
#111
MM-IR
opened
1 year ago
0
pass2_shuffle_holdout.py - ModuleNotFoundError: No module named 'parse'
#110
dboggs95
closed
1 year ago
1
URL Links
#109
akul-goyal
opened
1 year ago
2
Cannot download data , error
#108
infokng
opened
1 year ago
0
Suggested corpus: Adult stories
#107
johnflux
opened
1 year ago
1
Reducing download size
#106
marionbartl
opened
1 year ago
0
Pile-CC Size
#105
KeremTurgutlu
opened
1 year ago
0
ConvoKit datasets
#104
upintheairsheep
closed
1 year ago
2
Accepting submissions to the Pile
#103
upintheairsheep
closed
1 year ago
1
Ubuntu IRC broken encoding, impacting generative models downstream
#102
briansemrau
opened
1 year ago
6
"Github" code data download only
#101
HangXue-lab
opened
1 year ago
2
tfds_pile
#100
everks
opened
2 years ago
0
Appending data to the Pile.
#99
shankerabhigyan
opened
2 years ago
1
(Natural) Languages in The PILE
#98
suzyahyah
opened
2 years ago
1
failed to download stackexchange
#97
sangmichaelxie
opened
2 years ago
1
Scripts for dedup and filter Common Crawl?
#96
shangw-nvidia
opened
2 years ago
1
import fasttext_pybind as fasttext fails with undefined symbol
#95
HughPH
opened
2 years ago
0
download website is not accessible
#94
portia1026
opened
2 years ago
1
Fix CommonCrawlDataset
#93
researcher2
opened
2 years ago
1
Public website to explore dataset
#92
tuan3w
opened
3 years ago
1
Royalroad
#91
KeinNiemand
closed
3 years ago
1
Meta data `file_name` in the GitHub part of The Pile a bit off
#90
thomwolf
opened
3 years ago
2
SHA256 Sums
#89
ghost
closed
3 years ago
1
Updated with the times
#88
StellaAthena
closed
3 years ago
0
Code generation
#87
6r1d
closed
3 years ago
1
Update README.md
#86
leogao2
closed
3 years ago
0
Caucasian Languages
#85
Plkmoi
closed
3 years ago
0
Version2
#84
Plkmoi
closed
3 years ago
0
Api
#83
QazQazaq
closed
3 years ago
0
Caucasian Languages Dataset
#82
QazQazaq
closed
3 years ago
0
Early Buddhism
#81
Blue7771
closed
3 years ago
4
Exploiting bitexts
#80
eritain
closed
3 years ago
2
WIP - update readme for release
#79
leogao2
closed
3 years ago
2
Link to code
#78
leogao2
closed
3 years ago
0
Create CODEOWNERS
#77
StellaAthena
closed
3 years ago
0
Russian dialogs and stories from the Pickabu website
#76
maloyan
closed
3 years ago
0
Legal Contracts
#75
hendrycks
closed
3 years ago
1
Adding TFDS Implementation for the_pile
#74
trisongz
closed
3 years ago
1
Updating with recent changes
#73
StellaAthena
closed
3 years ago
0
Paper checklist
#72
leogao2
closed
3 years ago
0
PDF parsing
#71
leogao2
closed
3 years ago
11
Make treemaps
#70
leogao2
closed
3 years ago
1
Next