Closed princy149 closed 2 years ago
Hi, I don't completely understand this question - are you requesting qurro to export ROC scores? CC @fedarko
Hi mortonjt, thank you for your prompt response, yes actually I wanted to know that is there any many methods in qurro or songbird through which we can generate ROC curve based on selected ranking features?
Hi @princy149, I don't believe we have anything set up through Songbird or Qurro to export ROC information (i.e. true vs. false positive classification rates).
There are some challenges with creating a "classifier" based on these methods, at least as far as I see things. In theory, you could use regression or something to construct a classifier based on the log-ratios selected using Qurro (based in turn on the feature rankings output by Songbird or another tool), but such a classifier would probably suffer from data leakage because Qurro makes use of the entire dataset it's provided (so the performance of this classifier on the dataset it was created from will be overly optimistic; to validate it, we would need to use a separate dataset).
I know Songbird uses a training/testing data split when computing the feature rankings, but I suspect that if we want to actually create a classifier based on a log-ratio selected afterwards, the best practice would be withholding part of your data as a "test set" before running Songbird + Qurro—and then, later on, evaluating the classifier using this test set. I tried this approach for a class project last year: the code for this is located in this repository (mostly in the notebooks/
folder), although I can't offer any guarantees about it.
If you are interested in more formal approaches to performing classification using log-ratios, the CoDaCoRe package (Gordon-Rodriguez et al. 2021) might be useful.
@fedarko Thank you very much for explaining this and hints for another options. I found your suggestions helpful.
I echo Marcus’s comment. Songbird is not designed for accurate classification
On Sat, May 21, 2022 at 1:32 PM Marcus Fedarko @.***> wrote:
Hi @princy149 https://github.com/princy149, I don't believe we have anything set up through Songbird or Qurro to export ROC information (i.e. true vs. false positive classification rates).
There are some challenges with creating a "classifier" based on these methods, at least as far as I see things. In theory, you could use regression or something to construct a classifier based on the log-ratios selected using Qurro (based in turn on the feature rankings output by Songbird or another tool), but such a classifier would probably suffer from data leakage https://en.wikipedia.org/wiki/Leakage_(machine_learning) because Qurro makes use of the entire dataset it's provided (so the performance of this classifier on the dataset it was created from will be overly optimistic; to validate it, we would need to use a separate dataset).
I know Songbird uses a training/testing data split when computing the feature rankings, but I suspect that if we want to actually create a classifier based on a log-ratio selected afterwards, the best practice would be withholding part of your data as a "test set" before running Songbird + Qurro—and then, later on, evaluating the classifier using this test set. I tried this approach for a class project last year: the code for this is located in this repository https://github.com/fedarko/283-project (mostly in the notebooks/ folder), although I can't offer any guarantees about it.
If you are interested in more formal approaches to performing classification using log-ratios, the CoDaCoRe package (Gordon-Rodriguez et al. 2021 https://academic.oup.com/bioinformatics/article/38/1/157/6366546?login=true) might be useful.
— Reply to this email directly, view it on GitHub https://github.com/biocore/songbird/issues/163#issuecomment-1133701047, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA75VXLZ2DPBCYOIGUE3WATVLEM3XANCNFSM5WGA4U4Q . You are receiving this because you commented.Message ID: @.***>
hi
After getting songbird results, is there any method as ROC model for Differentially ranked features accuracy?