Chicago / west-nile-virus-predictions

Algorithm to predict repeated positive results for West Nile Virus for mosquitoes captured in traps across Chicago.
MIT License
14 stars 1 forks source link

Fix filename / date parsing function #29

Closed geneorama closed 7 years ago

geneorama commented 7 years ago

I ran into an error while trying to automate the predictions.

The first step in the prediction is the feature creation, which downloads the latest copy of the West Nile data on the data portal.

The function refresh_wnv() checks the dataportal downloads in ./data and downloads today's data if it's not already downloaded. The function open_latest_wnv_file then opens the latest file.

Both of these functions use parse_dated_filename, which parses the file names of all the files in the data folder into folder, basename, download date, etc. with the function parse_dated_filename.

If some of the files don't have the "dirname / basename / date / file extension format", the function is fine, but if none of the files have the expected format, then the function fails.

geneorama commented 7 years ago

Fixed. Here are the tests:

> files <- c("data/traps_portal_2016-09-30.Rds",
+                "data/traps_portal/data_portal_2016-09-30.Rds",
+                "data/traps_portal/data_portal2016-09-30.Rds",
+                "data/traps_portal/2016-09-30.Rds",
+                "data/traps_portal_2016-10-31.Rds",
+                "data/traps_oracle_2016-08-24.Rds",
+                "data/traps_oracle_2016-08-26.Rds",
+                "data/traps_oracle_2016-09-30.Rds",
+                "data/traps_oracle_2017-10-31.Rds",
+                "data/traps.oracle_2017-10-31.Rds",
+                "data/wnv_results_portal_2016-08-08.Rds",
+                "data/wnv_results_portal_2016-08-09.Rds",
+                "data/wnv_results_portal_2016-08-10.Rds",
+                "data/wnv_results_portal_2016-08-11.Rds",
+                "data/traps_new_data_source.Rds",
+                "data/traps_portal.Rds",
+                "data/traps_oracle.Rds",
+                "data/traps_oraclez.Rds",
+                "data/traps_.zoracle.Rds",
+                "data/traps_results.Rds",
+                "data/traps_results_portal.R")
>     parse_dated_filename(files)
                                       fullname               dir                     filename_full       filename_base       date ext
1              data/traps_portal_2016-09-30.Rds              data       traps_portal_2016-09-30.Rds       traps_portal_ 2016-09-30 Rds
2  data/traps_portal/data_portal_2016-09-30.Rds data/traps_portal        data_portal_2016-09-30.Rds        data_portal_ 2016-09-30 Rds
3   data/traps_portal/data_portal2016-09-30.Rds data/traps_portal         data_portal2016-09-30.Rds         data_portal 2016-09-30 Rds
4              data/traps_portal/2016-09-30.Rds data/traps_portal                    2016-09-30.Rds                <NA>       <NA> Rds
5              data/traps_portal_2016-10-31.Rds              data       traps_portal_2016-10-31.Rds       traps_portal_ 2016-10-31 Rds
6              data/traps_oracle_2016-08-24.Rds              data       traps_oracle_2016-08-24.Rds       traps_oracle_ 2016-08-24 Rds
7              data/traps_oracle_2016-08-26.Rds              data       traps_oracle_2016-08-26.Rds       traps_oracle_ 2016-08-26 Rds
8              data/traps_oracle_2016-09-30.Rds              data       traps_oracle_2016-09-30.Rds       traps_oracle_ 2016-09-30 Rds
9              data/traps_oracle_2017-10-31.Rds              data       traps_oracle_2017-10-31.Rds       traps_oracle_ 2017-10-31 Rds
10             data/traps.oracle_2017-10-31.Rds              data       traps.oracle_2017-10-31.Rds       traps.oracle_ 2017-10-31 Rds
11       data/wnv_results_portal_2016-08-08.Rds              data wnv_results_portal_2016-08-08.Rds wnv_results_portal_ 2016-08-08 Rds
12       data/wnv_results_portal_2016-08-09.Rds              data wnv_results_portal_2016-08-09.Rds wnv_results_portal_ 2016-08-09 Rds
13       data/wnv_results_portal_2016-08-10.Rds              data wnv_results_portal_2016-08-10.Rds wnv_results_portal_ 2016-08-10 Rds
14       data/wnv_results_portal_2016-08-11.Rds              data wnv_results_portal_2016-08-11.Rds wnv_results_portal_ 2016-08-11 Rds
15               data/traps_new_data_source.Rds              data         traps_new_data_source.Rds                <NA>       <NA> Rds
16                        data/traps_portal.Rds              data                  traps_portal.Rds                <NA>       <NA> Rds
17                        data/traps_oracle.Rds              data                  traps_oracle.Rds                <NA>       <NA> Rds
18                       data/traps_oraclez.Rds              data                 traps_oraclez.Rds                <NA>       <NA> Rds
19                      data/traps_.zoracle.Rds              data                traps_.zoracle.Rds                <NA>       <NA> Rds
20                       data/traps_results.Rds              data                 traps_results.Rds                <NA>       <NA> Rds
21                  data/traps_results_portal.R              data            traps_results_portal.R                <NA>       <NA>   R
>     parse_dated_filename(basename(files))
                            fullname dir                     filename_full       filename_base       date ext
1        traps_portal_2016-09-30.Rds           traps_portal_2016-09-30.Rds       traps_portal_ 2016-09-30 Rds
2         data_portal_2016-09-30.Rds            data_portal_2016-09-30.Rds        data_portal_ 2016-09-30 Rds
3          data_portal2016-09-30.Rds             data_portal2016-09-30.Rds         data_portal 2016-09-30 Rds
4                     2016-09-30.Rds                        2016-09-30.Rds                <NA>       <NA> Rds
5        traps_portal_2016-10-31.Rds           traps_portal_2016-10-31.Rds       traps_portal_ 2016-10-31 Rds
6        traps_oracle_2016-08-24.Rds           traps_oracle_2016-08-24.Rds       traps_oracle_ 2016-08-24 Rds
7        traps_oracle_2016-08-26.Rds           traps_oracle_2016-08-26.Rds       traps_oracle_ 2016-08-26 Rds
8        traps_oracle_2016-09-30.Rds           traps_oracle_2016-09-30.Rds       traps_oracle_ 2016-09-30 Rds
9        traps_oracle_2017-10-31.Rds           traps_oracle_2017-10-31.Rds       traps_oracle_ 2017-10-31 Rds
10       traps.oracle_2017-10-31.Rds           traps.oracle_2017-10-31.Rds       traps.oracle_ 2017-10-31 Rds
11 wnv_results_portal_2016-08-08.Rds     wnv_results_portal_2016-08-08.Rds wnv_results_portal_ 2016-08-08 Rds
12 wnv_results_portal_2016-08-09.Rds     wnv_results_portal_2016-08-09.Rds wnv_results_portal_ 2016-08-09 Rds
13 wnv_results_portal_2016-08-10.Rds     wnv_results_portal_2016-08-10.Rds wnv_results_portal_ 2016-08-10 Rds
14 wnv_results_portal_2016-08-11.Rds     wnv_results_portal_2016-08-11.Rds wnv_results_portal_ 2016-08-11 Rds
15         traps_new_data_source.Rds             traps_new_data_source.Rds                <NA>       <NA> Rds
16                  traps_portal.Rds                      traps_portal.Rds                <NA>       <NA> Rds
17                  traps_oracle.Rds                      traps_oracle.Rds                <NA>       <NA> Rds
18                 traps_oraclez.Rds                     traps_oraclez.Rds                <NA>       <NA> Rds
19                traps_.zoracle.Rds                    traps_.zoracle.Rds                <NA>       <NA> Rds
20                 traps_results.Rds                     traps_results.Rds                <NA>       <NA> Rds
21            traps_results_portal.R                traps_results_portal.R                <NA>       <NA>   R

This was the part that was saying "subscript out of bounds":

>     parse_dated_filename(files[c(20,21)])
                     fullname  dir          filename_full filename_base date ext
1      data/traps_results.Rds data      traps_results.Rds          <NA> <NA> Rds
2 data/traps_results_portal.R data traps_results_portal.R          <NA> <NA>   R

Also added test to make sure that just one file without a date will get parsed without error.

>     parse_dated_filename(files[c(20)])
                fullname  dir     filename_full filename_base date ext
1 data/traps_results.Rds data traps_results.Rds          <NA> <NA> Rds

These "tests" are in the comments of the function itself.