SABS-R3-Epidemiology / branchpro

Using branching processes to estimate the time-dependent reproduction number of a disease with imported cases
BSD 3-Clause "New" or "Revised" License
4 stars 1 forks source link

Plot serial intervals #177

Closed FarmHJ closed 3 years ago

codecov[bot] commented 3 years ago

Codecov Report

Merging #177 (0e526bf) into main (0495672) will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##              main      #177   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           11        11           
  Lines          622       622           
=========================================
  Hits           622       622           

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 0495672...0e526bf. Read the comment docs.

FarmHJ commented 3 years ago

Addressing #168

FarmHJ commented 3 years ago

Very nice, thanks for making those changes @FarmHJ However, I think we should normalize the serial intervals after loading them from the file. The discretization method only yields normalized values if you include a nonzero w_0 term, while we are always setting w_0=0. For most serial interval samples this w_0 is much less than 1%, so it will not be perceptible in the figure, but I think it's worth doing so that the method is fully correct.

Also, what do you think of plotting serial_intervals[i,:26] instead so that the ends of the lines are even with the final day 25 tick?

I don't fully understand how to normalise the serial intervals. Nevertheless, I've tried normalising it. I divide the entry in each row by its sum, doing it only on the time period we are interested in.

rccreswell commented 3 years ago

I don't fully understand how to normalise the serial intervals. Nevertheless, I've tried normalising it. I divide the entry in each row by its sum, doing it only on the time period we are interested in.

I'm actually thinking to normalize the full 60 day serial intervals from the file (what the BranchProPosterior does). The values truncated to the 25 day plotting window will be slightly unnormalized, but that's okay as the probabilities we are plotting are still correct. So something like: serial_intervals = serial_intervals / np.sum(serial_intervals, axis=1)[:,np.newaxis]

On this subject, I suspect there's something out of date about the serial intervals csv file currently in the repository. In this PR, can you add a call of np.random.seed to the beginning of the write_ser_int_data function in data_library/serial_interval/parse_data_si.py, then rerun that file and commit the new csv as well?