Closed cboettig closed 2 years ago
@rqthomas ok I think this ready for review!
I've tested it against the scoring.R script for scoring all challenge entries (though not publishing anything). That run took only 30 min:
because the score file is now written out as soon as a forecast is scored, rather than collecting all scores and the writing out all the score csvs, this approach is more memory-efficient too, which means it should be okay to run with increased parallelization. I ran with 2 cores, but could easily go up to 4 or 8 and cut down scoring time further.
maybe best part is that generating the combined
table is now simply a matter of reading in the csvs, which can leverage the readr 2.0 feature of taking a vector of csvs:
> bench::bench_time({
+
+ scores_files <- fs::dir_ls("scores/", type="file", recurse = TRUE)
+ combined <- readr::read_csv(scores_files, progress = FALSE, lazy = FALSE, show_col_types = FALSE)
+
+ })
process real
11s 4.37s
(haven't tried comparing these combined scores to the original ones yet though...)
@rqthomas this is more-or-less a complete re-factor of
score.R
, so would greatly appreciate a review here. I've tried to keep the code to short functions doing simple discrete tasks, so hopefully it's not too difficult to read but please flag any areas that look dodgy as they may also be a source for bugs!read_forecast()
has been stripped down to largely remove all data cleaning; except what part was explicitly ncdf based, which I left untouched except to try and isolate it inside a dedicated subroutine.crps_logs_score()
is now much shorter, taking in already-pivoted tables.new helper fns,
pivot_target()
andpivot_forecast()
now take on the responsibilities of both cleaning and pivoting the data formats. Combines some cleaning code previously incrps_logs_score()
and some fromread_forecast()
, and some slight modifications.returns a data.frame that retains the observation and mean+uncertainty of the prediction,
horizon
,issue_date
, &forecast_start_time
:Still testing this out against the actual submitted forecast library, we'll see what the net impact on speed will be. If this works, generating combined table though should now be both fast and trivial.