issues
search
commoncrawl
/
cc-mrjob
Demonstration of using Python to process the Common Crawl dataset with the mrjob framework
MIT License
166
stars
65
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Specify bash in get-data.sh
#32
brianpeiris
closed
3 years ago
1
Upgrade to use Python 3, fixes #11
#31
sebastian-nagel
opened
3 years ago
0
AWS EMR issues
#30
DallanQ
opened
3 years ago
2
Can not run examples locally
#29
brand17
closed
3 years ago
0
Error Launching job : Output directory s3://mapreducecommoncrawl/output1 already exists. Streaming Command Failed! Command exiting with ret '5'
#28
PhuongDelrosario
closed
3 years ago
4
subprocess failed with code 143
#27
CryptoKR
closed
3 years ago
1
AWS EMR: ImportError: cannot import name ReadTimeoutError
#26
unkrich
closed
5 years ago
1
bootstrapping issues
#25
andresriancho
closed
5 years ago
3
Support WARC/1.1
#24
sebastian-nagel
opened
6 years ago
0
Unable to run examples on aws emr cluster
#23
bruceadowns
closed
6 years ago
3
Not working anymore on EMR? "subprocess failed with code 1"
#22
joergrech
closed
6 years ago
17
Use boto3, fixes #18
#21
sebastian-nagel
closed
6 years ago
0
Moving a message away from the `print` to `logging`
#20
adamb0mb
closed
7 years ago
10
Gitignore Visual Studio Code config files
#19
adamb0mb
closed
7 years ago
1
Upgrade to use boto3
#18
sebastian-nagel
closed
6 years ago
1
Cannot run mrjob on EMR
#17
rubenmarias
closed
7 years ago
3
Log which WARC/WAT/WET file is processed
#16
sebastian-nagel
closed
6 years ago
1
Source and documentation cleanup
#15
bruceadowns
closed
7 years ago
0
request-canceled-and-instance-running
#14
2803media
closed
7 years ago
0
Can't fetch history log; missing job ID
#13
2803media
closed
7 years ago
4
ImportError: No module named mrcc
#12
2803media
closed
7 years ago
0
Python 3 compatibility
#11
sebastian-nagel
opened
7 years ago
1
Update EMR conf
#10
sebastian-nagel
closed
6 years ago
0
Job fails when running local job
#9
mitcheccles
closed
7 years ago
2
Fixed EMR issues and implemented automatic handling of local paths
#8
beeker1121
closed
7 years ago
0
Job fails on Hadoop by assuming local WARC path
#7
sebastian-nagel
closed
7 years ago
3
Hw1_marjorie
#6
presquepartout
closed
9 years ago
0
Mids baek hw1
#5
rockb1017
closed
9 years ago
1
Fix bootstrap script for older AMI versions
#4
c-w
closed
9 years ago
0
Homework 1 submission from Lucas Dan
#3
lucasdan3
closed
9 years ago
0
Assignment 1
#2
gunnarklee
closed
9 years ago
0
Local mode
#1
rdhyee
closed
10 years ago
2