Closed noah-weingarden closed 3 months ago
Attention: Patch coverage is 97.67442%
with 1 lines
in your changes are missing coverage. Please review.
Project coverage is 96.34%. Comparing base (
44326de
) to head (9fdb01f
).
Files | Patch % | Lines |
---|---|---|
madoop/mapreduce.py | 97.61% | 1 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I'll add documentation to the README and the streaming tutorial if this gets approval to move forward.
Yes, I think this is the way to go to support EECS 485 Project 5.
WDYT about including an additional provided partitioner that ships with Madoop and is compatible with Hadoop, like -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner? Would that help with our P5 problem?
We could do that, although it looks like https://github.com/eecs485staff/p5-search-engine/pull/710 already uses this to solve the P5 problem as-is. Let me know if you still want this feature anyway.
WDYT about including an additional provided partitioner that ships with Madoop and is compatible with Hadoop, like -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner? Would that help with our P5 problem?
. We could do that, although it looks like https://github.com/eecs485staff/p5-search-engine/pull/710 already uses this to solve the P5 problem as-is. Let me know if you still want this feature anyway.
Moving this discussion to https://github.com/eecs485staff/p5-search-engine/issues/714
I'm merging to get this going for W24 P5
This PR is a proposal for adding support for a custom partitioner, which emulates Hadoop's Partitioner class. This would allow the project 5 inverted index to be segmented into precisely
num_reducers
partitions.Example usage:
where
example/partition.py
contains:All lines with a word whose first letter is alphabetically at or before "G" will end up in
part-00000
, while all lines with a word whose first letter is alphabetically after "G" will end up inpart-00001
.