Is your feature request related to a problem? Please describe.
Classifiers are one of our most powerful tools, and they are currently quite poorly defined processes for us to execute. We are currently running RF classifiers with Qiime, the implementation of which is rather opaque, and does not yield all our desired results. To supplement this, I have been running a custom script using scikit-learn to extract the necessary statistics from these models (see /sc/arion/projects/MMEDS/classification_modeling/classification_model_stats.py). This is fine in the short term but is still very tedious. Additionally, @circlespie is currently the only one who can create Lasso models, another powerful tool.
Describe the solution you'd like
Once #457 has been resolved, we should incorporate models into either the standard analysis workflow and/or its own classification model workflow. I would prefer to move entirely to a scikit-learn-based approach, as we currently have to use it anyway.
Describe alternatives you've considered
It would be possible, if a bit more annoying, to continue using qiime sample-classifier for these models and extracting stats post hoc.
Is your feature request related to a problem? Please describe. Classifiers are one of our most powerful tools, and they are currently quite poorly defined processes for us to execute. We are currently running RF classifiers with Qiime, the implementation of which is rather opaque, and does not yield all our desired results. To supplement this, I have been running a custom script using scikit-learn to extract the necessary statistics from these models (see
/sc/arion/projects/MMEDS/classification_modeling/classification_model_stats.py
). This is fine in the short term but is still very tedious. Additionally, @circlespie is currently the only one who can create Lasso models, another powerful tool.Describe the solution you'd like Once #457 has been resolved, we should incorporate models into either the standard analysis workflow and/or its own classification model workflow. I would prefer to move entirely to a
scikit-learn
-based approach, as we currently have to use it anyway.Describe alternatives you've considered It would be possible, if a bit more annoying, to continue using
qiime sample-classifier
for these models and extracting stats post hoc.