Chicago / predicting-e-coli-concentrations

This repository is part of the working draft for an upcoming an academic paper describing the methods and results of the City of Chicago Clear Water project.
2 stars 0 forks source link

Numbers in Section 3.1 do not all match Table 3 #93

Closed nicklucius closed 6 years ago

nicklucius commented 6 years ago

For example, Hybrid Model 2017 is reported as 0.837 in line 263, but in Table 3 it's be 0.744. Some of the other numbers are right and others are off a bit.

For reference: https://github.com/Chicago/predicting-e-coli-concentrations/releases/tag/water-research-sub-rev.1

@tomschenkjr - second pair of eyes?

nicklucius commented 6 years ago

I noticed that the numbers for the Multivariate 2017 model are different in a recent generation of the PDF, linked here: https://github.com/Chicago/predicting-e-coli-concentrations/pull/87#issuecomment-417963346

I just cloned a fresh copy of the repos and got the same Table 3 numbers as shown in the latest release.

tomschenkjr commented 6 years ago

@nicklucius - first, I did see a mismatch on AUC between the copy I generated and the tables on line 262.

However, I don't see anything else. Do you only see line 262 with the discrepancy?

nicklucius commented 6 years ago

So it's actually Table 3 itself that differs.

Bioarxiv

Here is Table 3 in the copy you generated for bioarxiv:

image

Latest Release on GitHub

And here is Table 3 in the copy I generated for the latest release:

image

tomschenkjr commented 6 years ago

Ok, I'll check my commits when I get back on my computer to make sure I generated from the same hashes.

On Tue, Sep 18, 2018 at 3:40 PM Nick Lucius notifications@github.com wrote:

So it's actually Table 3 itself that differs. Bioarxiv

Here is Table 3 in the copy you generated for bioarxiv https://www.biorxiv.org/content/biorxiv/early/2018/09/14/250480.full.pdf :

[image: image] https://user-images.githubusercontent.com/16853555/45714855-6b820780-bb58-11e8-850d-1b1a998a7c6c.png Latest Release on GitHub

And here is Table 3 in the copy I generated for the latest release https://github.com/Chicago/predicting-e-coli-concentrations/releases/tag/water-research-sub-rev.1 :

[image: image] https://user-images.githubusercontent.com/16853555/45715086-f5ca6b80-bb58-11e8-809d-84148f432abb.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Chicago/predicting-e-coli-concentrations/issues/93#issuecomment-422546088, or mute the thread https://github.com/notifications/unsubscribe-auth/ABkC0UVlpgWOTOXPP8Cy6NhIh1leKAt8ks5ucVpYgaJpZM4Wsuiw .

-- Tom Schenk Jr. tomschenkjr@gmail.com @tomschenkjr tomschenkjr.net

nicklucius commented 6 years ago

Perfect. Here are the commits for me:

predicting-e-coli-concentrations: bda1ed56f9cdebd6d4b1fd7f9539fb09a92b1b0e clear-water: https://github.com/Chicago/clear-water/commit/dd072196fea05163f643bae7612d19cb2ca7a40d

tomschenkjr commented 6 years ago

I'm still unable to reproduce Table 3 model correctly. My hashes are matching yours. Still investigating.

nicklucius commented 6 years ago

Sounds good. All the data is cached in Rds files so I wonder if it could be package version issues. If we need to I could try adding packrat to these repos.

nicklucius commented 6 years ago

I tried building the paper on a different machine, and I got the same Table 3 numbers you've been getting. Then I tried to set up packrat on the machine where I'd been generating it all along and getting the slightly different numbers. After packrat re-downloaded all the packages from CRAN in a local directory, rebuilt the packages from scratch, and reloaded, I'm now getting the same Table 3 numbers you're getting, which I also got on the 3rd machine.

I'm thinking that there must have been something funky with my package installation.

nicklucius commented 6 years ago

With more testing, I am able to reproduce the Bioarxiv version of Table 3 above, so I think this is resolved. I will submit a pull request that fixes the one wrong number and adds packrat files so that people can use all the same versions of packages that were used to generate the paper PDF.

Already, I'm seeing a lot of formatting differences using up-to-date versions of all packages, but am able to generate the paper exactly like we want it using packrat to use our original package versions.