Closed GoogleCodeExporter closed 8 years ago
experiments done at NEC Labs
Original comment by kelp...@gmail.com
on 22 Oct 2013 at 5:45
Are you sure the outputs are not over many runs? Can you try to remove all the
output files and rerun your job? Let me know if you see the same issue once you
clear the files and rerun.
Original comment by vinay...@gmail.com
on 22 Oct 2013 at 6:26
[deleted comment]
Yes, I am sure. I remove all the output before I run each time. I run several
times, and it was the same.
Original comment by kelp...@gmail.com
on 22 Oct 2013 at 6:35
Can you please attach the Main.java file that constructs the job to this issue?
Thanks.
Original comment by vinay...@gmail.com
on 23 Oct 2013 at 2:27
The Main.java file is already attached in the first post. Thanks.
Original comment by kelp...@gmail.com
on 23 Oct 2013 at 4:00
I am wondering the skewness happens at the join part or the group-by part.
@kelphet Could we try one more time, but remove the group-by to see whether the
result of join is unbalanced?
Original comment by jarod...@gmail.com
on 23 Oct 2013 at 12:08
I see the problem in the Java code that is constructing the job.
In the attached file, line 269 and line 279 still perform their operations
(hashing for the groupby and hashing for the partitioning respectively) on
column 6. They need to be changed to column 3 too. Can you try that and let me
know the outcome?
Thanks.
Original comment by vinay...@gmail.com
on 23 Oct 2013 at 4:33
[deleted comment]
Problem solved. I add a variable called groupbyFieldNum as below to keep
consistent. Thanks for all your help.
int groupByFieldNum = 3; // previously 6
if (hasGroupBy) {
RecordDescriptor groupResultDesc = new RecordDescriptor(new ISerializerDeserializer[] {
UTF8StringSerializerDeserializer.INSTANCE, IntegerSerializerDeserializer.INSTANCE });
HashGroupOperatorDescriptor gby = new HashGroupOperatorDescriptor(
spec,
new int[] { groupByFieldNum },
new FieldHashPartitionComputerFactory(new int[] { groupByFieldNum },
new IBinaryHashFunctionFactory[] { PointableBinaryHashFunctionFactory
.of(UTF8StringPointable.FACTORY) }),
new IBinaryComparatorFactory[] { PointableBinaryComparatorFactory.of(UTF8StringPointable.FACTORY) },
new MultiFieldsAggregatorFactory(
new IFieldAggregateDescriptorFactory[] { new CountFieldAggregatorFactory(true) }),
groupResultDesc, 16);
createPartitionConstraint(spec, gby, resultSplits);
IConnectorDescriptor joinGroupConn = new MToNPartitioningConnectorDescriptor(spec,
new FieldHashPartitionComputerFactory(new int[] { groupByFieldNum },
new IBinaryHashFunctionFactory[] { PointableBinaryHashFunctionFactory
.of(UTF8StringPointable.FACTORY) }));
spec.connect(joinGroupConn, join, 0, gby, 0);
endingOp = gby;
}
Original comment by kelp...@gmail.com
on 23 Oct 2013 at 5:18
The reports last comment show this issue has been fixed.
Original comment by ecarm...@ucr.edu
on 18 Nov 2014 at 6:22
Original issue reported on code.google.com by
kelp...@gmail.com
on 22 Oct 2013 at 5:44Attachments: