issues
search
facebookresearch
/
cc_net
Tools to download and cleanup Common Crawl data
MIT License
972
stars
142
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
how to only compute the perplexity of each paragraph using your language model with local data?
#54
rongjingyue423
opened
1 year ago
1
503 Server Error: Service Unavailable for url
#53
yangyang0202
opened
1 year ago
1
Whether CC_Net provides an existing monolingual corpus
#52
yangyang0202
opened
1 year ago
0
Can reproduce still run normally?
#51
newbietuan
closed
1 year ago
0
win10 use cc_net
#50
z-x-x136
opened
1 year ago
0
requests.exceptions.HTTPError: 503 Server Error: Service Unavailable for url
#49
Hieunohair
opened
1 year ago
1
CC-100 in statmt version is different from paper
#48
nbqu
opened
1 year ago
0
Annotation statistics
#47
mauriceweber
closed
1 year ago
0
从wet格式中提取文本
#46
wwfcnu
opened
1 year ago
2
Numerous Errors
#45
conceptofmind
opened
1 year ago
2
The final json files are not as expected
#44
nengyinyibeiwu
opened
1 year ago
0
Update CC_net code to make it can be run in Spark cluster
#43
junwan-db
opened
1 year ago
1
The questions about the stats json configuration file
#42
QHPHBias
opened
1 year ago
0
Changes for AI2-LLM
#41
rodneykinney
closed
1 year ago
2
Inquiries about utilizing 2022 collected common rawl snapshots
#40
hyunmokky
opened
1 year ago
0
Inquiries about korean datasets utilized in the CCNet pipeline
#39
hyunmokky
opened
1 year ago
1
Fixes
#38
rodneykinney
closed
1 year ago
1
when use odoo 16.0 in pycharm show this Error
#37
mohamedGaber93
opened
1 year ago
0
Error: Mining phase failure
#36
AssisRaphael
closed
2 years ago
1
403 forbidden while downloading
#35
Raven-Ren
opened
2 years ago
2
update wet url root
#34
suamin
closed
1 year ago
3
Update execution.py
#33
styxjedi
opened
2 years ago
2
sbatch: error: Batch job submission failed: Invalid job array specification
#32
swgu98
closed
2 years ago
0
Batch job submission failed: Invalid job array specification
#31
swgu98
opened
2 years ago
3
I want to copy the output data of CC_net directly, what should I do?
#30
mome1024
closed
2 years ago
1
Question about the size of Roberta-small
#28
MatthewCYM
closed
3 years ago
0
Variance of hash files sizes in newer crawls
#27
var926
opened
3 years ago
1
"Reproducing our work" does not specify set of languages and snapshots
#26
leezu
opened
3 years ago
2
cc_net/tools/dl_cc_100.py fails to extract complete dataset
#25
leezu
opened
3 years ago
6
getpy version specified in setup.py no longer available
#24
leezu
closed
3 years ago
1
Fix typo in README (dl_all_lm -> dl_all_lms)
#23
chloamme
closed
3 years ago
3
Running on local files
#22
sashavor
closed
3 years ago
4
Model finding
#21
sashavor
opened
3 years ago
0
make dl_all_lm failing
#20
sashavor
opened
3 years ago
2
Error: Job not requeued because: timed-out and not checkpointable.
#19
hadifar
opened
3 years ago
12
Are not all languages in the paper supported?
#18
feddybear
closed
4 years ago
1
add CC-100 download script
#17
gwenzek
closed
4 years ago
0
Error when Running 2020-34 dumps
#16
Phil1108
opened
4 years ago
4
Doing hashing, mining and regroup from each bin order
#15
aswin-giridhar
closed
4 years ago
1
requests.exceptions.HTTPError: 429 Client Error: Too Many Requests for url
#14
bioothod
opened
4 years ago
5
Change the release format for smaller disk usage
#13
gwenzek
closed
4 years ago
0
ERROR: Package u'cc-net' requires a different Python: 2.7.12 not in '>=3.7'
#12
Nanamumuhan
closed
4 years ago
2
EOFError: Compressed file ended before the end-of-stream marker was reached
#11
zl827154659
closed
4 years ago
4
Add info about prerequisites on Ubuntu
#10
leogao2
opened
4 years ago
2
support of Hausa
#9
donglixp
closed
4 years ago
4
Dedup all paragraphs if it appear more than once?
#8
xingenju
closed
4 years ago
2
Cannot download the precpomputed files
#7
yinfeiy-g
closed
4 years ago
7
Decrease RAM usage, investigate miss documents
#6
gwenzek
closed
4 years ago
3
Early exit when desired number of documents is reached?
#5
JohnGiorgi
closed
4 years ago
3
Failing to use mp execution
#4
alexandremuzio
opened
4 years ago
4
Next