blab / ncov-forecasting-fit

Assessing accuracy of fitness model forecasts
1 stars 2 forks source link

adding script PR #40

Closed Eabousam closed 4 months ago

marlinfiggins commented 4 months ago

There's a couple of issues with these updates.

First seems to be removing the filling of NAs in the retrospective sequence counts.

new_truth = df.merge(truth_set, how="left").fillna(0)

Without this filling, days with no sequences for any of the variants will have NA frequency for all variants since there are NAs in the computation. You can see this by printing the new_truth immediately after. There should be 0 NAs here since this is constructed to fill in data for counts from all locations, variants, and dates and assume they are 0 if absent from the data. I've reinstated the fillna(0) poriton to fix the issue.

Second, I've switched the merge order, so predictions are only merged if they correspond to a date, variant, location pair in our truth set.

Lastly, I've re-implemented the changes we discussed previously to make sure that we do not floor the raw or smoothed frequencies to 0 when they are NA.

Bonus: I changed a variable name from final_set -> merged to make it more clear what is actually going into this function.