Closed NiccoloSalvini closed 1 year ago
I would keep 2 parameters.
emergency_scenario
custom_start_date
3 possible outputs:
emergency_scenario
: the user gets what he asked for OR an error.emergency_scenario
, but specifies custom_start_date
with a valid ymd
: in this case the user get the custom starting date.I suggest to lowercase any element of the list, then to force a tolower()
on the input.
if (tolower(emergency_name) %in% names(emergency_list)) {
# if it exists, return the corresponding date
return(emergency_list[emergency_name])
} else {
# if the input emergency name does not exist in the list, return an error message
return("Error: Emergency not found in list.")
}
}
As for handling spelling errors, you could use the agrep function in R to implement fuzzy matching. This would allow the function to find approximate matches for the input emergency name, even if it is misspelled.
I would suggest an output like:
Error in "emergency_scenario": 'terremoto acuila' is not a scenario. Suggested scenario: 'terremoto aquila'. Scenarios available: "covid19", "terremoto aquila", "ucraine-russia war"
dynamically refer to the url (via scraping?)
Scraping imo not worth: the scraper must recognize the first date in italian format with a regex code. Say the Minister page uses once a weird format for whatever reason, the scraper will fail and possibly pick another date within the html.
Benefits: if it exists a within-package emergencies updater, users do not need to update coresoi to update the list of emergencies, and coresoi do not need to update itself for each emergency.
Another problem: scrapers are not very professional: once in 500 tentatives, it will randomly fail the connections to the urls. For 20 urls to scrape, more or less 1/25 times updates of the list will fail to scrape all the emergencies and the user will not know this unless a tester function is implemented alongside the scraper.
If the list is centralised it will possible to set English aliases for each emergency, e.g. terremoto aquila
= aquila earthquake
.
3 possible outputs:
- No parameter is specified: the function report error and asks to specify an emergency.
- The user specifies emergency_scenario: the user gets what he asked for OR an error.
- The user DO NOT specify emergency_scenario, but specifies custom_start_date with a valid ymd: in this case the user get the custom starting date.
I got your point, I agree, we should be making it conditional to user input. My guess is that people would prefer emergency_name
instead of declaring dates since you don't have to cope with converting to ymd
and seems more intuitive, but at least you offer an option. First thought is If you leave the user choosing a date, then how we would be filling the emergency_name
and emergency_id
if not specified?
e.g. let's say 6/2/1994 (no emergency_name, random date, my birthday) what's the emergency_name
and emergency_id
for that?
On top of that I believe that moving 1,2,3 +/- days on the official date would most likely will not impact that much on the indicator estimates, but that's just my assumption (we should be writing a test for that, testing for the indicator consistency on an emergency time interval +/- 1 weeks from the emergency start) assuming I got dates correctly.
I suggest to lowercase any element of the list, then to force a tolower() on the input.
I am a little bit skeptical on that. That would increase the chance of finding the right match at the cost of having things lowercase in the ouput, which is not that formally correct. We may think to str_to_title after the match but that seems a little to much engineered. I implemented agrep
which computes semantic distance on a max dist of .3 and then gets the most likely match result. It does behave well.
I would suggest an output like:
Error in "emergency_scenario": 'terremoto acuila' is not a scenario. Suggested scenario: 'terremoto aquila'. Scenarios >available: "covid19", "terremoto aquila", "ucraine-russia war"
This is more informative then the one I coded. I'll do it!
Scraping imo not worth: ...
Totally on your side. it looks too much effort for relatively low impact.
conditional to user input. My guess is that people would prefer
I suggest a UX where only if the user browses the helper, he do see the hidden parameter custom_start_date
. In the helper can be said that if you want an emergency, you can leave it blank.
Indeed, I implicitly suggested that emergecy_scenario
overwrites inputs of custom_start_date
. Only if emergency_scenario
is blank custom_starting_date
is not blank then the functions should try to force a custom ymd
based on the input, in all other cases error or a pre-set date.
I believe that in the extended, release of coresoi
the user would like to custom own inputs.
E.g. set a date that is not an emergency but, e.g. the election of a politician.
context
We would like to have the user to select emergency scenario based on string/pattern, say "covid19", "terremoto aquila" i.e. a string instead the emergency date. This is more intuitive and prevent selecting the wrong date for emergency outbreaks, since we know that those are formally stated by the Authority.
Current behavior
say we are interested in calculating
ind_11
, whose statistical unit target is "provincia" which measures the Distance between award value and sums paid indicator for a given Emergency scenario defined by a date classymd()
format, default behavior is to set that aslubridate::ymd("2017-06-30")
, terremoto aquila AND as target statistical unit of measurament "cf_amministrazione_appaltante" i.e. Contracting authoritywhich via the
generate_indicator_schema()
turns out to have the following table:Moreover say that the user selects a different
outbreak_starting_date
for that implying the exact same Emergency. This would lead to different results since pre/post aggregation would involve different groups by setting i.e. specifying outbreak_starting_date = lubridate::ymd("2017-09-30"), 3 months after.Expected behavior
Instead of specifying the date we would force the user to set a scenario by string: "covid19", "Terremoto Aquila" by controlling the options available. This from one side would prevent to specify wrong dates (loosing informative power), on the other will offer a more friendly api user interface to indicators. This would be also coupled up with automatic type error checks suggesting which are the alternative Emergency scenarios.
... and when you mispelled it then this suggests something like:
This function should use a named list to store the emergency names and their corresponding dates. It then checks if the input emergency name exists in the list, and if it does, it returns the corresponding date. Otherwise, it returns an error message.
this should be passe within each indicator through the
generate_indicator_schema()
by a function that given an Emergency scenario string sets the date, then computes pre/post aggregation, in the end.This is just a sketch of an implementation
As for handling spelling errors, you could use the agrep function in R to implement fuzzy matching. This would allow the function to find approximate matches for the input emergency name, even if it is misspelled. Moreover we might want to have something like
emergency_type
related to the kind of emergency it is. Say "terremoto Aquila" is the emergency scenario we are looking for, then its type is "seismic", indeed if we are looking for "coronavirus" then that's a "sanitary" type of emergency.a further point
We might want also to dynamically get updated emergencies when they are out (along with their dates). This is a reference where they can be extracted. We may want to:
emergency_dates()
function as their dates (this is the easiest option) and believe it or not emergencies are not that common