480Oswego2013 / CSC-HCI-480-2013-repo

8 stars 8 forks source link

RF results differ even though data set is identical #44

Closed selenechida closed 11 years ago

selenechida commented 11 years ago

The same data set with same default model settings produces dramatically different RF results. Data set used was from hacking airline data Meetup

1 node cluster: 25% of the data is skipped and the error is 0.0%

16 node cluster: 25% of the data is skipped and the error is 0.0%

3 node cluster: 2.8% data is skipped and the error is ~37%

3 node cluster - uploaded the same data set again: 25% of the data is skipped and the error is 0.0%

3 node cluster - another computer that is part of the same 3 node cluster: 25% of the data is skipped and the error is 0.0%

Screenshot from a single machine with a 3 node cluster and a 1 node cluster: http://i.imgur.com/kQrSV3x.png

Column by column validation of the data within the console produced no differences between the data sets.

selenechida commented 11 years ago

Oops, wrong thread. sorry about that!