azmfaridee / mothur

This is GSoC2012 fork of 'Mothur'. We are trying to implement a number of 'Feature Selection' algorithms for microbial ecology data and incorporate them into mother's main codebase.
https://github.com/mothur/mothur
GNU General Public License v3.0
3 stars 1 forks source link

Finalize "CommandParameters" in "setParameters" method of "ClassifySharedCommand" #24

Closed azmfaridee closed 11 years ago

azmfaridee commented 12 years ago

Here are some of the parameters that would bee needed to run the ClassifySharedCommand class.

Now keeping these in mind and following up from what we had from Issue #4, the setParameters() function looks something like this:

vector<string> ClassifySharedCommand::setParameters(){    
     try {
          // we'll input a shared file and a design file name
          CommandParameter pshared("shared", "InputTypes", "", "", "none", "none", "none", false, true); parameters.push_back(shared);
          CommandParameter pdesign("design", "InputTypes", "", "", "none", "none", "none", false, true); parameters.push_back(pdesign);

          // user will specify number of trees
          CommandParameter pnumtrees("numtrees", "Number", "", "100", "", "", "", false, false); parameters.push_back(pnumtrees);

          // user will specify tree splitting criteria
          CommandParameter psplitcriteria("splitcriteria", "Multiple", "gainratio-infogain", "infogain", "", "", "", false, false); parameters.push_back(psplitcriterion);

          // user will specify how much OTU to consider for each split of the total number of OTU
          CommandParameter potupersplitcriteria("otupersplit", "Multiple", "squareroot-log2", "log2", "", "", "", false, false); parameters.push_back(potupersplitcriteria);

          // set input and output folder
          CommandParameter pinputdir("inputdir", "String", "", "", "", "", "", false, false); parameters.push_back(pinputdir);
          CommandParameter poutputdir("outputdir", "String", "", "", "", "", "", false, false); parameters.push_back(poutputdir);

          // user can specify to run the algo on only the labels specified from the shared file
          CommandParameter plabel("label", "String", "", "", "", "", "", false, false); parameters.push_back(plabel);

          // I've kept this here as Sarah put it here the last time
          CommandParameter pgroup("group", "InputTypes", "", "", "none", "none", "none", false, false); parameters.push_back(group);

          vector<string> myArray;
          for (int i = 0; i < parameters.size(); i++) {     myArray.push_back(parameters[i].name);          }
          return myArray;
     }
     catch(exception& e) {
          m->errorOut(e, "ClassifySharedCommand", "setParameters");
          exit(1);
     }
}

@mothur-westcott @kdiverson Let me know if this makes any sense at all. I also have a question in mind, is the line

CommandParameter pgroup("group", "InputTypes", "", "", "none", "none", "none", false, false);
parameters.push_back(group);

really necessary? I've seen this in other commands, what does it normally do?

mothur-westcott commented 12 years ago

I think you want:

CommandParameter pgroups("groups", "String", "", "", "", "", "",false,false); parameters.push_back(pgroups);

The groups parameter is used to select the samples you would like to include from the shared file. The group parameter is used to provide a group file.

If you include this in the constructor:

//code to parse group names users want to select and save these groups for sharedRabundVector to use later in read
groups = validParameter.validFile(parameters, "groups", false);         
if (groups == "not found") { groups = ""; }
else { m->splitAtDash(groups, Groups); }
m->setGroups(Groups);

And then read the shared file using the inputdata class, mothur will handle checking the groups string to make sure they are valid, storing only the samples the user wants and eliminating zeroed OTUs caused by eliminating groups. If no groups are selected then all samples are stored.

//code to read the first label in the shared file.  
InputData input (sharedfile, "sharedfile");
vector<SharedRAbundVector*> lookup = input.getSharedRAbundVectors();

The newcommandtemplate.cpp execute function has a good example of how to use mothurs classes to handle a shared file.

azmfaridee commented 12 years ago

I think you want:

CommandParameter pgroups("groups", "String", "", "", "", "", "",false,false); parameters.push_back(pgroups);

Ah, this is what I wanted two write, apologies for the unintentional mistake. So I take it that plabel and group's are basically a sort of filtering mechanism on the input data, which the user is able to fine tune.

I'd do on to implementing the rest of the class then.

mothur-westcott commented 12 years ago

Yes, that's right. The groups parameter lets you select samples and the label parameter lets you select the distance you are interested in looking at. The design file also has the "sets" parameter associated with it that lets you select the treatment you are interested in.