Closed joeHickson closed 3 years ago
That looks complete to me? Not sure what I am missing. Quite surprised to see error and trace in their as that should only be being saved on error.
@seabbs I re-ran the uk datasets by hand on saturday so you might have been seeing the results of that. https://github.com/epiforecasts/covid-rt-estimates/tree/587995ba64056b2d43ba2cd478fbf421b78bb1c2/subnational/united-kingdom/cases/national/Scotland/latest will show you it in the partial state
Just to clarify this issue. Everything works for any dataset works when run by hand (what is the exact runtime instruction here?) and only the first dataset works when running on the CRON job with all others failing due to the lack of a temporary directory?
Is that all correct?
Or they only work when run by hand in the test repo and not in the production one? I don't understand the behaviour change between the tests and production. What is the difference here? Or are we actually saying the tests never worked?
they fail at random (probably more than half of the time). Running them by hand on production wins by sheer number of retries. I think the tests worked but I think that was because it got lucky!
Have you tried rolling back the future setting?
in testing we tried 4 datasets all of which worked?
I can't remember - it might have been 2 (united-kingdom + canada) but I couldn't tell you with confidence either way
Running now with future = false
That worked for uk admissions - I'll see how the cron picks up overnight.
Are you seeing issues due to old results (which are not present on the test server) conflicting with new results? Maybe clearing previous estimates out would help?
just waiting on the cron to run - it ended up firing 9 hours late because I still had it set from trying to play catch up at the weekend.
Looks good so far. Think a potential problem may occur if somewhere fails/times out due to the issue above but we will see. Been reading more about detecting what has happened in Stan model so may be able to tighten that up but still no idea what the previous error was or why this might have worked. The tempdir issue is worrying as I have been unable to reproduce elsewhere.
Also are these files at root something to do with a recent change? (https://github.com/epiforecasts/covid-rt-estimates/blob/master/united-kingdom-admissions_raw_outcome.rds)
smells like debug code to me
Looks like maybe a git ignore needed as the US update (🥳 ) just pushing a similar object: https://github.com/epiforecasts/covid-rt-estimates/blob/master/united-states_raw_outcome.rds
nah, that was in progress when I put the last commit in so should be the final one
This is looking promising - lets see what happens tonight and possibly close it in the morning
Summary plots failing likely due to the presence of renamed columns (due to old estimates being present on the production server).
Suggested fix is to remove archived estimates. This likely applies to all datasets that have regions that were once estimates and now are not.
That sounds like a good idea - same issue afflicts e.g. where one subregion fails, see e.g. https://github.com/epiforecasts/covid-rt-estimates/blob/master/subnational/united-states/cases/summary/rt.csv which is a mixture of old and new column names because South Carolina failed.
@joeHickson could you remove all archived estimates before the next run, and re-run the US?
It looks like status.csv
does not contain the correct information for the UK estimates?
shall I set the refresh flag on the cron script to force a flush?
I assume the status.csv
is another issue?
Flush as in flush all estimates? In that case perhaps consider scheduling a re-run of everything that has failed at the end? A benefit of keeping old estimates is that once we're back to daily operation if one run fails there is still a fairly recent update unless there is something systematic about a data set that makes the model fall over.
status didn't update following a 403 error from dataverse that went away by the later scripts.
refresh does this:
if (refresh) {
if (dir.exists(location$target_folder)) {
futile.logger::flog.trace("removing estimates in order to refresh")
unlink(location$target_folder, recursive = TRUE)
}
}
the refresh option seems to have left it unhappy. I have no summary files at all - I think it might be that whilst it issues the warning for unable to load file it then does horrible things where it's used (get.R L121). Running readRDS locally with a garbage file it produces an error AND a warning - I think we are seeing one but not the other. Perhaps this could be resolved with a pre-filter on line 92 to remove those that don't have the final .rds file (and therefore failed)?
020-11-25 16:21:56 INFO Regions with runtime errors: 1
2020-11-25 16:21:56 INFO Runtime error in South West : South West: model fitting was timed out - try increasing the max_execution_time -
2020-11-25 16:21:56 INFO Saving timings information to : subnational/united-kingdom/cases/national
2020-11-25 16:21:56 DEBUG resetting future plan to sequential
2020-11-25 16:21:56 TRACE generating summary data
2020-11-25 16:21:56 INFO Saving summary to : subnational/united-kingdom/cases/summary
2020-11-25 16:21:56 INFO Extracting results from: subnational/united-kingdom/cases/national
2020-11-25 16:21:56 TRACE Getting regional results
2020-11-25 16:21:56 WARN simpleWarning in gzfile(file, "rb"): cannot open compressed file 'subnational/united-kingdom/cases/national/South West/latest/summarised_estimates.rds', probable rea
son 'No such file or directory'
2020-11-25 16:21:56 TRACE reading runtimes.csv
local output from rstudio:
foo <- readRDS("nothere.RDS")
Error in gzfile(file, "rb") : cannot open the connection
In addition: Warning message:
In gzfile(file, "rb") :
cannot open compressed file 'nothere.RDS', probable reason 'No such file or directory'
Looks like most things are there except still no summary plots / csvs in some cases e.g. here: https://github.com/epiforecasts/covid-rt-estimates/tree/master/national/cases/summary Is this because some countries are no longer being estimated, as Sam suggests? E.g. here there are a bunch at the top that still have the old column headers: https://raw.githubusercontent.com/epiforecasts/covid-rt-estimates/master/national/cases/summary/rt.csv As it's only a few, could these just be removed manually?
@sbfnk I was unable to get the clean refresh to run yesterday (see prior comment). As such todays results are unlikely to differ in structural result to yesterdays.
Yes, but the non-refreshed run of cases the overnight run seems to have worked - so the only thing failing is the summary (presumably because of stray results from the past because of countries that are no longer being estimated)?
The overnight run processed but some subregions are failing so the summaries are containing the mix of results shapes and partially failing
As far as I can see if you removed old estimates for St. Kitts & Nevis, Fiji, British Virgin Islands, Western Sahara, New Caledonia, Nicaragua, Montserrat, Guinea-Bissau, there would be no more old results.
but when it is rerun it will then fail because the folder will exist for those sublocations with a partial set of result files (all the failures at present seem to be timeouts) but not the summarised_estimates.RDS. A manual removal would have the same effect as using the refresh flag. I could temporarily run the region with a -e flag excluding the areas we anticipate timing out but the next time they are included the summary will fail if those subregion doesn't process.
Pulling locally and deleting all files in St kitts folder but leaving the folder this is what I see.
> results <- get_regional_results(
+ results_dir = "national/cases/national",
+ samples = FALSE,
+ forecast = FALSE)
Warning message:
In gzfile(file, "rb") :
cannot open compressed file 'national/cases/national/St. Lucia/latest/summarised_estimates.rds', probable reason 'No such file or directory'
>
> results
$estimates
$estimates$summarised
region date variable strat type median mean sd lower_90 lower_50 lower_20 upper_20
1: Afghanistan 2020-08-30 R <NA> estimate 1.0283317 1.0431218 1.592124e-01 0.8308071 0.9517083 0.9983677 1.0549592
2: Afghanistan 2020-08-31 R <NA> estimate 1.0264497 1.0386057 1.449917e-01 0.8395174 0.9538103 0.9988531 1.0528583
3: Afghanistan 2020-09-01 R <NA> estimate 1.0256086 1.0340139 1.320410e-01 0.8457254 0.9551772 0.9989090 1.0508716
4: Afghanistan 2020-09-02 R <NA> estimate 1.0236139 1.0293454 1.203152e-01 0.8512935 0.9563708 0.9980154 1.0483355
5: Afghanistan 2020-09-03 R <NA> estimate 1.0214960 1.0246156 1.097498e-01 0.8573886 0.9575690 0.9973209 1.0452908
---
102725: Zimbabwe 2020-12-03 reported_cases <NA> forecast 67.5000000 657.9985000 1.276615e+04 5.0000000 25.0000000 46.0000000 102.0000000
102726: Zimbabwe 2020-12-04 reported_cases <NA> forecast 100.5000000 1236.0882500 1.850625e+04 7.0000000 36.0000000 66.0000000 154.0000000
102727: Zimbabwe 2020-12-05 reported_cases <NA> forecast 129.0000000 3136.0827500 7.851510e+04 9.0000000 42.0000000 81.6000000 196.0000000
102728: Zimbabwe 2020-12-06 reported_cases <NA> forecast 79.5000000 1892.4100000 4.164878e+04 5.0000000 24.0000000 51.0000000 129.0000000
102729: Zimbabwe <NA> reporting_overdispersion <NA> <NA> 0.3106832 0.3322482 1.243521e-01 0.1797304 0.2379935 0.2817226 0.3399858
upper_50 upper_90 bottom top lower upper central_lower central_upper
1: 1.1005886 1.3386775 NA NA NA NA NA NA
2: 1.0962464 1.3084931 NA NA NA NA NA NA
3: 1.0910206 1.2784433 NA NA NA NA NA NA
4: 1.0869074 1.2548376 NA NA NA NA NA NA
5: 1.0834770 1.2140677 NA NA NA NA NA NA
---
102725: 200.0000000 1065.7500000 NA NA NA NA NA NA
102726: 313.0000000 1890.4000000 NA NA NA NA NA NA
102727: 403.2500000 2697.4500000 NA NA NA NA NA NA
102728: 273.0000000 1970.3000000 NA NA NA NA NA NA
102729: 0.3946213 0.5656102 NA NA NA NA NA NA
With fault tolerance working as expected (i.e by giving a warning and no error).
As this problem is in the summary it can be debuged without rerunning estimates and trying to explore logs.
Again running localling and so being able to see errors I see:
> regional_summary(reported_cases= reported_cases, results_dir = "national/cases/national", all_regions = FALSE) -> tmp
INFO [2020-11-26 13:31:56] No summary directory specified so returning summary output
INFO [2020-11-26 13:31:56] Extracting results from: national/cases/national
Error in data.table::rbindlist(numeric_estimate) :
Item 53 has 7 columns, inconsistent with item 1 which has 9 columns. To fill missing columns use fill=TRUE.
In addition: Warning messages:
1: In gzfile(file, "rb") :
cannot open compressed file 'national/cases/national/St. Lucia/latest/summarised_estimates.rds', probable reason 'No such file or directory'
2: In gzfile(file, "rb") :
Show Traceback
Rerun with Debug
Error in data.table::rbindlist(numeric_estimate) :
Item 53 has 7 columns, inconsistent with item 1 which has 9 columns. To fill missing columns use fill=TRUE.
Which is an issue as pointed out by Seb with older estimates still being present in the estimates as published to GitHub
Patching that (see EpiNow2@v1.3.2
) I now see the following successful summary:
> reported_cases <- data.table::as.data.table(covidregionaldata::get_national_data())[, .(date, region = country, confirm = cases_new)]
> regional_summary(reported_cases= reported_cases, results_dir = "national/cases/national", all_regions = FALSE) -> tmp
INFO [2020-11-26 13:47:15] No summary directory specified so returning summary output
INFO [2020-11-26 13:47:15] Extracting results from: national/cases/national
Warning messages:
1: In gzfile(file, "rb") :
cannot open compressed file 'national/cases/national/St. Lucia/latest/summarised_estimates.rds', probable reason 'No such file or directory'
2: In gzfile(file, "rb") :
cannot open compressed file 'national/cases/national/St. Lucia/latest/summary.rds', probable reason 'No such file or directory'
> tmp
$latest_date
[1] "2020-11-26"
$results
$results$estimates
$results$estimates$summarised
region date variable strat type median mean sd lower_90 lower_50 lower_20 upper_20
1: Afghanistan 2020-08-30 R <NA> estimate 1.0283317 1.0431218 1.592124e-01 0.8308071 0.9517083 0.9983677 1.0549592
2: Afghanistan 2020-08-31 R <NA> estimate 1.0264497 1.0386057 1.449917e-01 0.8395174 0.9538103 0.9988531 1.0528583
3: Afghanistan 2020-09-01 R <NA> estimate 1.0256086 1.0340139 1.320410e-01 0.8457254 0.9551772 0.9989090 1.0508716
4: Afghanistan 2020-09-02 R <NA> estimate 1.0236139 1.0293454 1.203152e-01 0.8512935 0.9563708 0.9980154 1.0483355
5: Afghanistan 2020-09-03 R <NA> estimate 1.0214960 1.0246156 1.097498e-01 0.8573886 0.9575690 0.9973209 1.0452908
Where the warnings indicate missing results but should cause no failure.
Updating this to save to disk I see the following:
which looks successful.
Repeating with data deletions at random I still see success.
It might be that warning is just a warning and something else is falling over. It's always fun debugging this lot! I'll try flicking us over to 1.3.2 (it looks like it's currently 1.3.0) and see what gives if I run it with --refresh update: I see you beat me to the 1.3.2 trick ;)
That's warming up all the cores now running with --refresh. I'll try and keep an eye out for the first UK cases dataset to finish (it should push to git if it doesn't error)
2020-11-26 15:08:30 INFO Regions with estimates: 9
2020-11-26 15:08:30 INFO Regions with runtime errors: 3
2020-11-26 15:08:30 INFO Runtime error in Midlands : Midlands: model fitting was timed out - try increasing the max_execution_time -
2020-11-26 15:08:30 INFO Runtime error in South West : South West: model fitting was timed out - try increasing the max_execution_time -
2020-11-26 15:08:30 INFO Runtime error in United Kingdom : United Kingdom: model fitting was timed out - try increasing the max_executio
n_time -
2020-11-26 15:08:30 INFO Saving timings information to : subnational/united-kingdom/cases/national
2020-11-26 15:08:30 DEBUG resetting future plan to sequential
2020-11-26 15:08:30 TRACE generating summary data
2020-11-26 15:08:30 INFO Saving summary to : subnational/united-kingdom/cases/summary
2020-11-26 15:08:30 INFO Extracting results from: subnational/united-kingdom/cases/national
2020-11-26 15:08:30 TRACE Getting regional results
2020-11-26 15:08:30 WARN simpleWarning in gzfile(file, "rb"): cannot open compressed file 'subnational/united-kingdom/cases/national/Mid
lands/latest/summarised_estimates.rds', probable reason 'No such file or directory'
2020-11-26 15:08:30 TRACE reading runtimes.csv
2020-11-26 15:08:30 TRACE naming output
2020-11-26 15:08:30 DEBUG add stats to output
2020-11-26 15:08:30 TRACE publish_data function
I don't think that's produced any summary files again - https://github.com/epiforecasts/covid-rt-estimates/tree/master/subnational/united-kingdom/cases
Again debugging locally I see the following:
library(data.table)
library(EpiNow2)
library(covidregionaldata)
reported_cases <- fread("subnational/united-kingdom/cases/summary/reported_cases.csv")
regional_summary(reported_cases = reported_cases,
results_dir = "subnational/united-kingdom/cases/national",
summary_dir = "subnational/united-kingdom/cases/summary",
all_regions = TRUE)
INFO [2020-11-26 15:30:07] Saving summary to : subnational/united-kingdom/cases/summary
INFO [2020-11-26 15:30:07] Extracting results from: subnational/united-kingdom/cases/national
Error: Incompatible classes: <IDate> + <Period>
In addition: Warning messages:
1: In gzfile(file, "rb") :
cannot open compressed file 'subnational/united-kingdom/cases/national/Midlands/latest/summarised_estimates.rds', probable reason 'No such file or directory'
2: In gzfile(file, "rb") :
cannot open compressed file 'subnational/united-kingdom/cases/national/South West/latest/summarised_estimates.rds', probable reason 'No such file or directory'
3: In gzfile(file, "rb") :
cannot open compressed file 'subnational/united-kingdom/cases/national/United Kingdom/latest/summarised_estimates.rds', probable reason 'No such file or directory'
4: In gzfile(file, "rb") :
cannot open compressed file 'subnational/united-kingdom/cases/national/Midlands/latest/summary.rds', probable reason 'No such file or directory'
5: In gzfile(file, "rb") :
cannot open compressed file 'subnational/united-kingdom/cases/national/South West/latest/summary.rds', probable reason 'No such file or directory'
6: In gzfile(file, "rb") :
Show Traceback
Rerun with Debug
Error: Incompatible classes: <IDate> + <Period>
I do see all results except plots have been updated.
Dropping into debug using debugonce(regional_summary)
I see that this was due to a get_regions_with_most_reports
and an issue with the date formatting caused by saving and reading back in reported cases. Adding the following resolved:
reported_cases <- reported_cases[, date := as.Date(date)]
Running the following:
library(data.table)
library(EpiNow2)
library(covidregionaldata)
reported_cases <- fread("subnational/united-kingdom/cases/summary/reported_cases.csv")
reported_cases <- reported_cases[, date := as.Date(date)]
regional_summary(reported_cases = reported_cases,
results_dir = "subnational/united-kingdom/cases/national",
summary_dir = "subnational/united-kingdom/cases/summary",
all_regions = TRUE)
Results in no errors and a folder structure as below which looks complete.
I don't suppose it's something to do with the fact it's running with slightly different params?
regional_summary(
reported_cases = cases,
results_dir = "subnational/united-kingdom/cases/national",
summary_dir = "subnational/united-kingdom/cases/summary",
region_scale = "Region",
all_regions = True,
return_output = FALSE
)
ignore that - I can see that it's just default values.
library(data.table)
library(EpiNow2)
library(covidregionaldata)
reported_cases <- fread("subnational/united-kingdom/cases/summary/reported_cases.csv")
reported_cases <- reported_cases[, date := as.Date(date)]
regional_summary(reported_cases = reported_cases,
results_dir = "subnational/united-kingdom/cases/national",
summary_dir = "subnational/united-kingdom/cases/summary",
region_scale = "Region",
all_regions = TRUE,
return_output = FALSE)
Updated and still works as expected.
It seems to have only generated a subset of files for the UK regions (and other datasets) - https://github.com/epiforecasts/covid-rt-estimates/tree/master/subnational/united-kingdom/cases/national/Scotland/latest as an example. I have a log file full of tmp file issues again. I'm re-running to see whats going on