biocore / songbird

Vanilla regression methods for microbiome differential abundance analysis
BSD 3-Clause "New" or "Revised" License
54 stars 25 forks source link

songbird qurro result export as ranking order csv format with log fold change values for groups #163

Closed princy149 closed 2 years ago

princy149 commented 2 years ago

hi

After getting songbird results, is there any method as ROC model for Differentially ranked features accuracy?

mortonjt commented 2 years ago

Hi, I don't completely understand this question - are you requesting qurro to export ROC scores? CC @fedarko

princy149 commented 2 years ago

Hi mortonjt, thank you for your prompt response, yes actually I wanted to know that is there any many methods in qurro or songbird through which we can generate ROC curve based on selected ranking features?

fedarko commented 2 years ago

Hi @princy149, I don't believe we have anything set up through Songbird or Qurro to export ROC information (i.e. true vs. false positive classification rates).

There are some challenges with creating a "classifier" based on these methods, at least as far as I see things. In theory, you could use regression or something to construct a classifier based on the log-ratios selected using Qurro (based in turn on the feature rankings output by Songbird or another tool), but such a classifier would probably suffer from data leakage because Qurro makes use of the entire dataset it's provided (so the performance of this classifier on the dataset it was created from will be overly optimistic; to validate it, we would need to use a separate dataset).

I know Songbird uses a training/testing data split when computing the feature rankings, but I suspect that if we want to actually create a classifier based on a log-ratio selected afterwards, the best practice would be withholding part of your data as a "test set" before running Songbird + Qurro—and then, later on, evaluating the classifier using this test set. I tried this approach for a class project last year: the code for this is located in this repository (mostly in the notebooks/ folder), although I can't offer any guarantees about it.

If you are interested in more formal approaches to performing classification using log-ratios, the CoDaCoRe package (Gordon-Rodriguez et al. 2021) might be useful.

princy149 commented 2 years ago

@fedarko Thank you very much for explaining this and hints for another options. I found your suggestions helpful.

mortonjt commented 1 year ago

I echo Marcus’s comment. Songbird is not designed for accurate classification

On Sat, May 21, 2022 at 1:32 PM Marcus Fedarko @.***> wrote:

Hi @princy149 https://github.com/princy149, I don't believe we have anything set up through Songbird or Qurro to export ROC information (i.e. true vs. false positive classification rates).

There are some challenges with creating a "classifier" based on these methods, at least as far as I see things. In theory, you could use regression or something to construct a classifier based on the log-ratios selected using Qurro (based in turn on the feature rankings output by Songbird or another tool), but such a classifier would probably suffer from data leakage https://en.wikipedia.org/wiki/Leakage_(machine_learning) because Qurro makes use of the entire dataset it's provided (so the performance of this classifier on the dataset it was created from will be overly optimistic; to validate it, we would need to use a separate dataset).

I know Songbird uses a training/testing data split when computing the feature rankings, but I suspect that if we want to actually create a classifier based on a log-ratio selected afterwards, the best practice would be withholding part of your data as a "test set" before running Songbird + Qurro—and then, later on, evaluating the classifier using this test set. I tried this approach for a class project last year: the code for this is located in this repository https://github.com/fedarko/283-project (mostly in the notebooks/ folder), although I can't offer any guarantees about it.

If you are interested in more formal approaches to performing classification using log-ratios, the CoDaCoRe package (Gordon-Rodriguez et al. 2021 https://academic.oup.com/bioinformatics/article/38/1/157/6366546?login=true) might be useful.

— Reply to this email directly, view it on GitHub https://github.com/biocore/songbird/issues/163#issuecomment-1133701047, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA75VXLZ2DPBCYOIGUE3WATVLEM3XANCNFSM5WGA4U4Q . You are receiving this because you commented.Message ID: @.***>