CoronaNetDataScience / corona_tscs

This is the raw data repository (policy record format) of the CoronaNet project on government responses to the COVID-19 pandemic.
98 stars 57 forks source link

Start date later than end date #14

Closed ingonader closed 2 years ago

ingonader commented 4 years ago

For a few record_ids, the date_start is later then the date_end, which would indicate a negative duration of that policy. This is the case for the following record_ids:

1080940Dc, 1080940Dd, 1080940Dr, 1201090NA, 1213131NA, 1291080Eb, 1376049NA, 1396166NA, 1497319Dc, 1497319Dd, 1497319Dr, 1500422Dd, 1500422Dr, 1551660Cs, 1613878NA, 1635451NA, 1860490Dd, 1860490Dr, 1886438NA, 1907422Ds, 2025681Dq, 2025681Dt, 2025681Ch, 2025681Cx, 2025681Cs, 2106027Ds, 2181769NA, 2203320Dc, 2203320Dd, 2203320Dr, 2402909Eb, 2409269NA, 2580603Dc, 2580603Dd, 2580603Dr, 277361NA, 2967727Dc, 2967727Dd, 2967727Dr, 2996835NA, 302214NA, 3100114NA, 3196587Dc, 3196587Dd, 3196587Dr, 3252203Cx, 3252203Cs, 330495Dc, 330495Dd, 330495Dr, 3311910Dq, 3311910Dt, 3311910Ch, 3311910Cx, 3327340NA, 354348NA, 3590883Bf, 3677826Dq, 3677826Dt, 3677826Ch, 3677826Cx, 3693764Dq, 3693764Dt, 3693764Ch, 3715839Ds, 3735704NA, 3763985NA, 3794175NA, 4037529Dq, 4037529Ch, 4037529Cs, 4153702Dd, 4153702Dr, 4167161Dq, 4167161Dt, 4167161Ch, 4167161Cx, 4280347NA, 4306357NA, 4314699Dq, 4314699Ch, 4314699Cs, 4393915Ch, 4393915Cs, 995919Dd, 995919Dr, 4455908NA, 4485444Dc, 4485444Dd, 4485444Dr, 4605378Cs, 4627733Dc, 4627733Dd, 4627733Dr, 4721933NA, 4826323NA, 4886584Dq, 4886584Dt, 4886584Ch, 4886584Cx, 4886584Cs, 49131NA, 4961132Ds, 507041Dq, 507041Dt, 507041Ch, 507041Cx, 507041Aq, 5098458NA, 5099850NA, 5160018Ch, 5214714Dc, 5214714Dd, 5214714Dr, 5236072Dc, 5236072Dd, 5236072Dr, 5261177Dd, 5261177Dr, 598337Br, 598337Bs, 598337Cl, 5995960Dq, 5995960Dt, 5995960Ch, 5995960Cx, 6210428NA, 6331824NA, 6406597NA, 6458240Dq, 6458240Dt, 6458240Ch, 6458240Cx, 6476851Dc, 6476851Dd, 6476851Dr, 6565765Dq, 6565765Dt, 6565765Ch, 6565765Cx, 6599679NA, 6637344Ds, 6644718Ds, 6644718Dl, 6717336Eg, 6717336Eh, 6779662NA, 6896333Ch, 7149795Ch, 7149795Cs, 7189460Dq, 7189460Dt, 7189460Ch, 7189460Cx, 7216510NA, 7217769Dc, 7217769Dd, 7217769Dr, 7340325NA, 7356347NA, 7424442Dc, 7424442Dd, 7424442Dr, 7452915NA, 793340Dq, 793340Dt, 793340Ch, 793340Cx, 8111385Dc, 8111385Dd, 8111385Dr, 8192596Dc, 8192596Dd, 8192596Dr, 8236110Dc, 8236110Dd, 8236110Dr, 8340859Az, 8395385NA, 8417806NA, 8422362Aq, 861419Ds, 8721210Cn, 8724277Dq, 8724277Dt, 8724277Ch, 8724277Cx, 8730099Dc, 8730099Dd, 8730099Dr, 8745937NA, 8780204NA, 8861328Dc, 8861328Dd, 8861328Dr, 8942641Dq, 8942641Ch, 8942641Cx, 8971301NA, 9145215Co, 9352285Dc, 9352285Dd, 9352285Dr, 9378023Dc, 9378023Dd, 9378023Dr, 9509554NA, 9535192NA, 9559610Bd, 9559610Cj, 9559610Ee, 9638330Dq, 9638330Dt, 9638330Ch, 9638330Cx, 9638330Cs, 9708452Ds 

Example:

# A tibble: 217 x 10
   record_id policy_id country                  date_start date_end   duration entry_type update_type     
   <chr>     <chr>     <chr>                    <date>     <date>     <drtn>   <chr>      <chr>           
 1 1080940Dc 1702198   United States of America 2020-04-09 2020-03-26 -14 days update     NA              
 2 1080940Dd 1702198   United States of America 2020-04-09 2020-03-26 -14 days update     NA              
 3 1080940Dr 1702198   United States of America 2020-04-09 2020-03-26 -14 days update     NA              
 4 1201090NA 1888784   Azerbaijan               2020-04-18 2020-04-04 -14 days update     NA              
 5 1213131NA 1213131   United States of America 2020-05-09 2020-04-30  -9 days update     Change of Policy
 6 1291080Eb 1291080   Antigua and Barbuda      2020-06-01 2020-04-09 -53 days update     End of Policy   
 7 1376049NA 1376049   Germany                  2020-05-09 2020-04-19 -20 days update     End of Policy   
 8 1396166NA 5209839   Japan                    2020-05-17 2020-05-06 -11 days update     Change of Policy
 9 1497319Dc 8546355   Germany                  2020-04-27 2020-04-20  -7 days update     NA              
10 1497319Dd 8546355   Germany                  2020-04-27 2020-04-20  -7 days update     NA              
   type                              type_sub_cat                                                               
   <chr>                             <chr>                                                                      
 1 Closure and Regulation of Schools Preschool or childcare facilities (generally for children ages 5 and below)
 2 Closure and Regulation of Schools Primary Schools (generally for children ages 10 and below)                 
 3 Closure and Regulation of Schools Secondary Schools (generally for children ages 10 to 18)                   
 4 External Border Restrictions      NA                                                                         
 5 Restrictions of Mass Gatherings   NA                                                                         
 6 External Border Restrictions      Travel History Form (e.g. documents where traveler has recently been)      
 7 Social Distancing                 NA                                                                         
 8 Internal Border Restrictions      NA                                                                         
 9 Closure and Regulation of Schools Preschool or childcare facilities (generally for children ages 5 and below)
10 Closure and Regulation of Schools Primary Schools (generally for children ages 10 and below)                 
# … with 207 more rows

An R code snippet to find this data:

dat_measures_coronanet_core_raw %>%
    mutate(duration = date_end - date_start) %>% 
    filter(!is.na(duration), duration < 0) %>%
    select(record_id, policy_id, country, date_start, date_end, duration, entry_type, update_type, type, type_sub_cat, compliance, everything())
saudiwin commented 4 years ago

Thanks. This appears to be an error in our data release workflow @timothymodel @cwang23

timothymodel commented 4 years ago

Yup, there's a bug somewhere in the cleanQualtrics or the public release script that's causing that...

I'm working on another issue today.

Would you be able to take a look at this @Nikola Danevski ndanevsk@u.rochester.edu and @Wong, Brian bwong526@wharton.upenn.edu?

Thanks, Tim

On Tue, Jun 23, 2020 at 6:32 AM Robert Kubinec notifications@github.com wrote:

Thanks. This appears to be an error in our data release workflow @timothymodel https://github.com/timothymodel @cwang23 https://github.com/cwang23

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/saudiwin/corona_tscs/issues/14#issuecomment-648149966, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIVDDS6RLIF7PDKMNQGVT3LRYCVHRANCNFSM4OFTSGEQ .

IsaacBravo commented 2 years ago

Hi, I am checking if this is up to date, and I have checked that there are still records where the date_start is later than the date_end, which generates a negative policy duration. I have checked the initial_data and qualtrics_afterfill files, which generate 1360 records in both files, in the case of internal_data, there are 550 records. I have also done some manual tests on the shinyapp to confirm this. Thanks! Isaac.

cindyyawencheng commented 2 years ago

Please note that this problem should not grow over time, there are checks put in place now which prevents this problem from occurring and we are actively making progress to clean existing policies with this issue.