eecs485staff / madoop

A light weight MapReduce framework for education
MIT License
9 stars 4 forks source link

Hadoop Streaming Tutorial #38

Closed jaredzh closed 2 years ago

jaredzh commented 2 years ago

Move Hadoop Streaming Tutorial from P5 repo to Madoop repo.

Closes #38

TODO

codecov[bot] commented 2 years ago

Codecov Report

Merging #38 (bf4a874) into develop (eb3764a) will increase coverage by 0.34%. The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop      #38      +/-   ##
===========================================
+ Coverage    95.78%   96.13%   +0.34%     
===========================================
  Files            4        4              
  Lines          190      207      +17     
===========================================
+ Hits           182      199      +17     
  Misses           8        8              
Impacted Files Coverage Δ
madoop/__main__.py 93.87% <100.00%> (+3.25%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update eb3764a...bf4a874. Read the comment docs.

jaredzh commented 2 years ago

What would be the link for example? Do we need to upload it as a tar.gz? @awdeorio

awdeorio commented 2 years ago

What would be the link for example? Do we need to upload it as a tar.gz? @awdeorio

Ohh, good point. Yes.

awdeorio commented 2 years ago

I'm a little behind on this one, still working on some tweaks. Thanks for correcting those FIXME's that I had left for myself :) @jaredzh

jaredzh commented 2 years ago

No worries, it's all good! Yeah I wasn't sure whether you left them for you or for me so I just resolved them just in case it was for me lol. @awdeorio

awdeorio commented 2 years ago

I tried something to new to make it easy for someone to run the example program. I added a madoop --example flag which copies the example files to the user's PWD. WDYT?

$ madoop --example
Created example, try:

madoop \
  -input example/input \
  -output output \
  -mapper example/map.py \
  -reducer example/reduce.py        
$ madoop \
  -input example/input \
  -output output \
  -mapper example/map.py \
  -reducer example/reduce.py  
INFO: Starting map stage
INFO: Finished map executions: 2
INFO: Starting group stage
INFO: Starting reduce stage
INFO: Finished reduce executions: 3
INFO: Output directory: output
jaredzh commented 2 years ago

I tried something to new to make it easy for someone to run the example program. I added a madoop --example flag which copies the example files to the user's PWD. WDYT?

$ madoop --example
Created example, try:

madoop \
  -input example/input \
  -output output \
  -mapper example/map.py \
  -reducer example/reduce.py        
$ madoop \
  -input example/input \
  -output output \
  -mapper example/map.py \
  -reducer example/reduce.py  
INFO: Starting map stage
INFO: Finished map executions: 2
INFO: Starting group stage
INFO: Starting reduce stage
INFO: Finished reduce executions: 3
INFO: Output directory: output

I like this! This way they don't have to manually download the example files using wget.

awdeorio commented 2 years ago

OK! I'll polish this up and let you know when it's ready for a read through.

jaredzh commented 2 years ago

I made some changes and I think this is almost ready to go! Would you mind taking a detailed look at my changes and then creating PRs in the P5 and eecs485.org repos? @jaredzh

Yeah sounds good.

jaredzh commented 2 years ago

LGTM after some nitpicks. Go ahead and approve and merge if it's good to you. @awdeorio