azmfaridee / mothur

This is GSoC2012 fork of 'Mothur'. We are trying to implement a number of 'Feature Selection' algorithms for microbial ecology data and incorporate them into mother's main codebase.
https://github.com/mothur/mothur
GNU General Public License v3.0
3 stars 1 forks source link

Design the Mechanism of Writing the Output of Random Forest Algorithm into an Output File #29

Closed azmfaridee closed 11 years ago

azmfaridee commented 11 years ago

So far we have been printing the output in stdout, we need to be able to write the output of the algorithm to an output file just like other mothur commands.

@kdiverson @mothur-westcott Can you give me an idea about this?

Out current debugging output that we print to stdout is somewhat like this


numCorrect 150
forrestErrorRate: 0.197860962567
globalVariableRanks: [[9, 5.7], [2, 0.68], [44, 0.19], [158, 0.17], [14, 0.15], [27, 0.15], [31, 0.15], [144, 0.15], [33, 0.14], [16, 0.13], [86, 0.13], [141, 0.13], [182, 0.13], [264, 0.12], [41, 0.11], [302, 0.11], [36, 0.1], [58, 0.09], [161, 0.09], [22, 0.08], [89, 0.08], [145, 0.08], [360, 0.08], [11, 0.07], [37, 0.06], [40, 0.06], [56, 0.06], [60, 0.06], [71, 0.06], [105, 0.06], [133, 0.06], [157, 0.06], [243, 0.06], [285, 0.06], [1, 0.05], [21, 0.05], [23, 0.05], [24, 0.05], [47, 0.05], [68, 0.05], [82, 0.05], [98, 0.05], [117, 0.05], [195, 0.05], [286, 0.05], [568, 0.05], [15, 0.04], [30, 0.04], [51, 0.04], [72, 0.04], [91, 0.04], [114, 0.04], [127, 0.04], [129, 0.04], [184, 0.04], [230, 0.04], [335, 0.04], [7, 0.03], [26, 0.03], [64, 0.03], [69, 0.03], [70, 0.03], [84, 0.03], [151, 0.03], [154, 0.03], [175, 0.03], [176, 0.03], [202, 0.03], [258, 0.03], [293, 0.03], [296, 0.03], [427, 0.03], [3, 0.02], [35, 0.02], [74, 0.02], [81, 0.02], [87, 0.02], [92, 0.02], [93, 0.02], [112, 0.02], [116, 0.02], [140, 0.02], [164, 0.02], [167, 0.02], [180, 0.02], [207, 0.02], [215, 0.02], [219, 0.02], [231, 0.02], [266, 0.02], [267, 0.02], [338, 0.02], [346, 0.02], [399, 0.02], [400, 0.02], [407, 0.02], [562, 0.02], [643, 0.02], [12, 0.01], [17, 0.01], [29, 0.01], [32, 0.01], [54, 0.01], [61, 0.01], [65, 0.01], [67, 0.01], [73, 0.01], [88, 0.01], [100, 0.01], [102, 0.01], [113, 0.01], [115, 0.01], [119, 0.01], [142, 0.01], [143, 0.01], [159, 0.01], [183, 0.01], [218, 0.01], [224, 0.01], [252, 0.01], [256, 0.01], [265, 0.01], [273, 0.01], [276, 0.01], [277, 0.01], [280, 0.01], [287, 0.01], [294, 0.01], [308, 0.01], [334, 0.01], [339, 0.01], [371, 0.01], [382, 0.01], [406, 0.01], [409, 0.01], [421, 0.01], [444, 0.01], [468, 0.01], [532, 0.01], [673, 0.01], [737, 0.01]]

We need to design the format of the output file for this case. I'm thinking of a summary message followed by two column data. e.g.

Error Rate:    19.79%
--------------------------------------
Summary of Variable Ranks
--------------------------------------
OTU                 RANK
--------------------------------------
OTU10            5.7
OTU3              0.66
...                      ....
...                      ....

@kdiverson @mothur-westcott What do you think? Anything else we need to add in the output file. Also, I think each of the runs with different parameters would be different, therefore, re-using the output result from a previous runs wold be unnecessary (e.g. the forest created with log based criteria and the forest created with square-root based criteria would be different, and there would be no way to use one runs data for another run)

mothur-westcott commented 11 years ago

If you are going to output the OTU labels be sure to use the labels from m->currentBinLabels. They are set when the sharedRabundVector class reads the shared file. Other than that, I will defer to Kathryn's expertise as to what kind of output would be most helpful to users.

kdiverson commented 11 years ago

That output looks good.