Closed fedarko closed 5 years ago
Also, it'd probably be a good idea for multiple people to review this since this is a lot of changes, and it's important that they're correct. @lisa55asil if you'll be around tomorrow, would you mind sitting down and going over this with me? A lot of the documentation changes here involved your FAQs, so you'd probably be the most qualified person to comment on that :)
@fedarko , this is awesome! Will take a close look at it today / tomorrow.
Thanks! @lisa55asil is looking over it right now, so I think we can merge this in once we incorporate her feedback.
One quick question—there's this part of the FAQ:
Q. What's up with the --training-column argument?
A. That is used for cross-validation if you have a specific reproducibility question that you are interested in answering. If this is specified, only samples labeled "Train" under this column will be used for building the model and samples labeled "Test" will be used for cross validation. In other words the model will attempt to predict the microbe abundances for the "Test" samples. The resulting prediction accuracy is used to evaluate the generalizability of the model in order to determine if the model is overfitting or not. If this argument is not specified, then 10 random samples will be chosen for the test dataset. If you want to specify more random samples to allocate for cross-validation, the --num-random-test-examples argument can be specified.
This says "10 random samples", but the default for --num-random-test-examples
is 5. Should we change that?
Ok, Lisa's changes have been integrated in. We've both made some changes (I fixed the --num-random-test-examples
thing, as well as tidied up the discussions of # of iterations). Also tried to clean up the "baseline model" description in the summarize-paired help here.
@mortonjt, pending your final approval I think this should be good to merge in.
...made a few last-minute commits showing off using qiime metadata tabulate
on a differentials file, but i think that's it from me for right now :)
That's awesome! If its easy and you already have it a link to the output of tabulating the differentials.qza from the Red Sea data would be helpful I think.
On Mon, Sep 23, 2019 at 3:31 PM Marcus Fedarko notifications@github.com wrote:
...made a few last-minute commits showing off using qiime metadata tabulate on a differentials file, but i think that's it from me for right now :)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biocore/songbird/pull/80?email_source=notifications&email_token=AE6EV4WU2TFHC6UYXCRBDWDQLE7SLA5CNFSM4IYSYWZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7MO2TA#issuecomment-534310220, or mute the thread https://github.com/notifications/unsubscribe-auth/AE6EV4UX7Y5V7ROPYBL4BI3QLE7SLANCNFSM4IYSYWZA .
Ok! I'm not sure adding a link to a QZV is the best idea, since I guess that'd have to be periodically updated if we also update Songbird... but I added a screenshot of the QZV output. I think that's a good advertisement for that functionality :) (here's a link to the screenshot)
Noting qiime metadata tabulate
is a great idea!
@fedarko are you finished with edits?
Going to make one more pass over everything today. I'll try to have it done by 5pm EST.
Thanks! Marcus
On Tue, Sep 24, 2019, 9:00 AM Jamie Morton notifications@github.com wrote:
@fedarko https://github.com/fedarko are you finished with edits?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biocore/songbird/pull/80?email_source=notifications&email_token=AA736PYLQZ5MMMHB364XZY3QLI2TPA5CNFSM4IYSYWZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7O4EAA#issuecomment-534626816, or mute the thread https://github.com/notifications/unsubscribe-auth/AA736P4Y5QAKAM273AMFUITQLI2TPANCNFSM4IYSYWZA .
@mortonjt I'm done! Added a small subsection clearly describing setting the reference for categorical formulas, since this has confused a lot of people (me included) in the past.
Should be ready to merge in—thanks @lisa55asil + @mortonjt for your help with this :)
This is looking dope Marcus! I really like the clarification about the reference for categorical formulas. I added a short sentence that I think helps make this description more tangible.
On Tue, Sep 24, 2019 at 2:01 PM Marcus Fedarko notifications@github.com wrote:
@mortonjt https://github.com/mortonjt I'm done! Added a small subsection clearly describing setting the reference for categorical formulas, since this has confused a lot of people (me included) in the past.
Should be ready to merge in—thanks @lisa55asil https://github.com/lisa55asil + @mortonjt https://github.com/mortonjt for your help with this :)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biocore/songbird/pull/80?email_source=notifications&email_token=AE6EV4XZ6EMQNRAACDYSSLDQLJ527A5CNFSM4IYSYWZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7PZK6A#issuecomment-534746488, or mute the thread https://github.com/notifications/unsubscribe-auth/AE6EV4WIQYV5KM4JVLNRLYLQLJ527ANCNFSM4IYSYWZA .
@mortonjt I'm done with editing, so provided you're cool with it I think this is ready to go. Thanks!
Awesome. Thanks @fedarko !
An attempt to summarize new stuff:
A lot of changes to the README. I'll summarize a few of the important things here, but it's probably best if you just read over the new README.
Modified the q2 summary visualizations to be a lot easier to interpret:
loglikehood
" toloss
for the now-bottom plot, to be consistent with tensorboard--p-summary-interval
if you don't see anything in the plotsRemoved the
--i-feature-table
parameter forqiime songbird summarize-single
, since the number of samples is only used to compute Q2 values in the paired summary. (Closes #76.)n
an optional parameter for_summarize()
. As you can see in the diff, I added some code to check for the never-should-happen case wheren is None
butbaseline is not None
(i.e. we're making a paired summary but don't have access ton
for computing Q2 values)._summarize()
throws an error in this case.All descriptions and defaults for all parameters (of
multinomial
, at least) are now stored insongbird/parameter_info.py
. This centralized information is used by both the standalone and Q2 versions of Songbird, so now if you want to change a parameter you only need to change one thing.scripts/songbird
, so instead of spending a bunch of time manually formatting it I just ran the file throughblack -l 79
. (That's why that file looks so different in the diff.)OK, I think that's the bulk of it. Please let me know if you'd like any clarification or for me to change anything. In sum, I think these changes will make using Songbird a lot more pleasant (and understandable) for many people. Thanks!