Closed nicklucius closed 6 years ago
I noticed that the numbers for the Multivariate 2017 model are different in a recent generation of the PDF, linked here: https://github.com/Chicago/predicting-e-coli-concentrations/pull/87#issuecomment-417963346
I just cloned a fresh copy of the repos and got the same Table 3 numbers as shown in the latest release.
@nicklucius - first, I did see a mismatch on AUC between the copy I generated and the tables on line 262.
However, I don't see anything else. Do you only see line 262 with the discrepancy?
So it's actually Table 3 itself that differs.
Here is Table 3 in the copy you generated for bioarxiv:
And here is Table 3 in the copy I generated for the latest release:
Ok, I'll check my commits when I get back on my computer to make sure I generated from the same hashes.
On Tue, Sep 18, 2018 at 3:40 PM Nick Lucius notifications@github.com wrote:
So it's actually Table 3 itself that differs. Bioarxiv
Here is Table 3 in the copy you generated for bioarxiv https://www.biorxiv.org/content/biorxiv/early/2018/09/14/250480.full.pdf :
[image: image] https://user-images.githubusercontent.com/16853555/45714855-6b820780-bb58-11e8-850d-1b1a998a7c6c.png Latest Release on GitHub
And here is Table 3 in the copy I generated for the latest release https://github.com/Chicago/predicting-e-coli-concentrations/releases/tag/water-research-sub-rev.1 :
[image: image] https://user-images.githubusercontent.com/16853555/45715086-f5ca6b80-bb58-11e8-809d-84148f432abb.png
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Chicago/predicting-e-coli-concentrations/issues/93#issuecomment-422546088, or mute the thread https://github.com/notifications/unsubscribe-auth/ABkC0UVlpgWOTOXPP8Cy6NhIh1leKAt8ks5ucVpYgaJpZM4Wsuiw .
-- Tom Schenk Jr. tomschenkjr@gmail.com @tomschenkjr tomschenkjr.net
Perfect. Here are the commits for me:
predicting-e-coli-concentrations: bda1ed56f9cdebd6d4b1fd7f9539fb09a92b1b0e clear-water: https://github.com/Chicago/clear-water/commit/dd072196fea05163f643bae7612d19cb2ca7a40d
I'm still unable to reproduce Table 3 model correctly. My hashes are matching yours. Still investigating.
Sounds good. All the data is cached in Rds
files so I wonder if it could be package version issues. If we need to I could try adding packrat to these repos.
I tried building the paper on a different machine, and I got the same Table 3 numbers you've been getting. Then I tried to set up packrat on the machine where I'd been generating it all along and getting the slightly different numbers. After packrat re-downloaded all the packages from CRAN in a local directory, rebuilt the packages from scratch, and reloaded, I'm now getting the same Table 3 numbers you're getting, which I also got on the 3rd machine.
I'm thinking that there must have been something funky with my package installation.
With more testing, I am able to reproduce the Bioarxiv version of Table 3 above, so I think this is resolved. I will submit a pull request that fixes the one wrong number and adds packrat files so that people can use all the same versions of packages that were used to generate the paper PDF.
Already, I'm seeing a lot of formatting differences using up-to-date versions of all packages, but am able to generate the paper exactly like we want it using packrat to use our original package versions.
For example, Hybrid Model 2017 is reported as 0.837 in line 263, but in Table 3 it's be 0.744. Some of the other numbers are right and others are off a bit.
For reference: https://github.com/Chicago/predicting-e-coli-concentrations/releases/tag/water-research-sub-rev.1
@tomschenkjr - second pair of eyes?