Open mzagaja opened 6 years ago
After investigating further I have learned the following:
The generate_report.R script loads the R environment by referencing load(.RData.R)
which may or may not contain arbitrary functions and variables that are available to generate_report.R. There are functions in generate_report.R that we could not initially use/debug when loading into R studio like get_dates
because they were not otherwise loaded/available in the environment.
Enrollment information is contained in a static CSV file enrollment_15_16.csv
which is used for generating the R report. There are also some date specific functions like get_enrollment_df
that have a large number of conditionals and tap out after 15_16, effectively creating a Y2K style bug in the app.
A larger scope of work is needed to refactor and debug the R script going forward.
The reported number of completed surveys on the website currently differs from the reported number of completed surveys in the PDF report generated by R. Initially the difference was an undercount. This is due to how the SQL view that we use to calculate the reports in the report generator was discarding surveys that did not have data in the fields for
dropoff
andpickup
which refer to whether folks dropoff or pickup their child on the way to another activity. This question is only relevant and only appears in the user interface if the student is dropped off in a family vehicle. Otherwise, it is always empty. The result was that students choosing other transit modes were not be included in the report. We fix this by updating the SQL for this view in the forthcoming pull request.The second component is after fixing the SQL is figuring out why there is now an overcount of 7 surveys in the R report versus the website. Further research revealed that 3 reports counted on the website were actually blank. So the real distance was 10. The goal is to figure out where and how we are getting a count of 10 bonus surveys.
One attempted solution was blanking empty string cells with the following SQL:
This did not resolve the issue with the report count.
The raw number of rows going into the R report after blanking empty cells and deleting the three missing surveys is 132 (previously 141). This is seen by calling the following SQL: