Closed selenechida closed 11 years ago
The same data set with same default model settings produces dramatically different RF results. Data set used was from hacking airline data Meetup
1 node cluster: 25% of the data is skipped and the error is 0.0%
16 node cluster: 25% of the data is skipped and the error is 0.0%
3 node cluster: 2.8% data is skipped and the error is ~37%
3 node cluster - uploaded the same data set again: 25% of the data is skipped and the error is 0.0%
3 node cluster - another computer that is part of the same 3 node cluster: 25% of the data is skipped and the error is 0.0%
Screenshot from a single machine with a 3 node cluster and a 1 node cluster: http://i.imgur.com/kQrSV3x.png
Column by column validation of the data within the console produced no differences between the data sets.
Oops, wrong thread. sorry about that!
The same data set with same default model settings produces dramatically different RF results. Data set used was from hacking airline data Meetup
1 node cluster: 25% of the data is skipped and the error is 0.0%
16 node cluster: 25% of the data is skipped and the error is 0.0%
3 node cluster: 2.8% data is skipped and the error is ~37%
3 node cluster - uploaded the same data set again: 25% of the data is skipped and the error is 0.0%
3 node cluster - another computer that is part of the same 3 node cluster: 25% of the data is skipped and the error is 0.0%
Screenshot from a single machine with a 3 node cluster and a 1 node cluster: http://i.imgur.com/kQrSV3x.png
Column by column validation of the data within the console produced no differences between the data sets.