IUSCA / sca-issues

1 stars 0 forks source link

SPH RSC: Data encoding issue #38

Closed xinoooo closed 4 years ago

rperigo commented 4 years ago

(relates: https://github.com/IUSCA/sca-issues/issues/35)

As initial message was empty, some context: It appears the files themselves are accessible to RStudio Connect Shiny apps, however when attempting to actually plot the data, something fails due to missing / possibly malformed data:

Relevant quote (from me): To elucidate on that - I added some print statements to log the contents of the fitbit_var and study_id variables taken from the data files. This resulted in the following log additions:

2020/04/09 20:30:42.688645690 [1] "steps"                "calories"             "caloriesBMR"        
2020/04/09 20:30:42.688669697 [4] "distance"             "minutesSedentary"     "minutesLightlyActive"
2020/04/09 20:30:42.688771910 [7] "minutesFairlyActive"  "minutesVeryActive"    "activityCalories"   
2020/04/09 20:30:42.916269387 # A tibble: 11 x 10
2020/04/09 20:30:42.916285369    `Subject ID` `Accelerometer 
 `Randomization 
 Group `Fitbit email`
2020/04/09 20:30:42.916353422           <dbl> <lgl>                       <dbl> <chr> <lgl>        
2020/04/09 20:30:42.916364871  1            1 NA                              1 Fitb
 NA           
2020/04/09 20:30:42.916382868  2            2 NA                             NA <NA>  NA           
2020/04/09 20:30:42.916384426  3            3 NA                              2 Cont
 NA           
2020/04/09 20:30:42.916401518  4            4 NA                              3 Fitb
 NA           
2020/04/09 20:30:42.916403098  5            5 NA                              4 Fitb
 NA           
2020/04/09 20:30:42.916437535  6            6 NA                              5 Fitb
 NA           
2020/04/09 20:30:42.916439039  7            7 NA                              6 Fitb
 NA           
2020/04/09 20:30:42.916456071  8            8 NA                              7 F+W   NA           
2020/04/09 20:30:42.916457576  9            9 NA                              8 Fitb
 NA           
2020/04/09 20:30:42.916475081 10           10 NA                              9 F+W   NA           
2020/04/09 20:30:42.916476526 11           11 NA                             10 Whis
 NA           
2020/04/09 20:30:42.916495271 # 
 with 5 more variables: `Fitbit code` <chr>, `Whistle serial #` <chr>, `Dog
2020/04/09 20:30:42.916496822 #   ID` <dbl>, `Accelerometer Post` <lgl>, notes <chr>

It does appear the files are at least accessible to the application. However, the garbage characters above and the lines:

2020/04/09 19:56:13.884623871 Warning in force(expr) : input string 'é' cannot be translated to UTF-8, is it valid in 'ANSI_X3.4-1968'?
2020/04/09 19:56:13.884711497 Warning in force(expr) : input string 'é' cannot be translated to UTF-8, is it valid in 'ANSI_X3.4-1968'?
2020/04/09 19:56:13.885044413 Warning in force(expr) : input string 'é' cannot be translated to UTF-8, is it valid in 'ANSI_X3.4-1968'?

lead me to believe there may be an encoding issue. It could be something else, but it definitely appears to be in the way R is ingesting the data. The cause for the error message onscreen appears to be that the plot function cannot get proper values for X axis.

2020/04/10 16:14:29.889871963 Warning: Length of logical index must be 1 or 11, not 0
2020/04/10 16:14:29.896927937 Warning: Error in if: argument is of length zero
2020/04/10 16:14:29.907056554   180: PlotSubject
2020/04/10 16:14:29.907065427   179: renderPlot [/opt/rstudio-connect/mnt/app/app.R#102]
2020/04/10 16:14:29.907107400   177: func
2020/04/10 16:14:29.907109454   137: drawPlot
2020/04/10 16:14:29.907126694   123: <reactive:plotObj>
2020/04/10 16:14:29.907128292   107: drawReactive
2020/04/10 16:14:29.907144866    94: origRenderFunc
2020/04/10 16:14:29.907146463    93: output$PlotSubject
2020/04/10 16:14:29.907162763    13: runApp
2020/04/10 16:14:29.907164372    12: fn
2020/04/10 16:14:29.907189844     7: connect$retry
2020/04/10 16:14:29.907191996     6: eval
2020/04/10 16:14:29.907229365     5: eval
2020/04/10 16:14:30.000214436 Warning in min(x) : no non-missing arguments to min; returning Inf
2020/04/10 16:14:30.000632994 Warning in max(x) : no non-missing arguments to max; returning -Inf
2020/04/10 16:14:30.022418103 Warning: Error in plot.window: need finite 'xlim' values
rperigo commented 4 years ago

@xinoooo Any luck debugging on your end?

I played around with things a bit and it looks like there are a few issues related to how it's actually ingesting the data - e.g. setting the source to use the uploaded xlsx file from /opt/sca/sphdata resulted in different errors because the column names in the R code do not match the column names in the spreadsheet.

The application is definitely able to read from the correct folder, but we may have to do some more work to ensure the data is formatted correctly and the R code is looking for the correct columns etc.

rperigo commented 4 years ago

@xinoooo

I did a couple quick tests. By copying the data from the "static" application Stephanie published and adjusting app.R to use All_Daily.csv instead of All_Daily.utf8.csv it does appear to work and plot correctly. You may want to doublecheck the character encoding for your input file(s) and make sure you're calling the correct column names.

Does the app you published last week work correctly on your laptop using the same dataset?

agopu commented 4 years ago

@xinoooo Been almost 2 weeks since we made progress with this. How can we help further?!

xinoooo commented 4 years ago

@rperigo @agopu Thanks for following up! I am quite busy this month and probably will be for the next two weeks for other projects, can we resume afterward?

rperigo commented 4 years ago

Hi Xiwei,

We previously extended the length of this pilot until the and of April. We can discuss this with Arvind, Stephanie, et al to see if we can extend it further, but I can't make any guarantees as we're booked up quite fully on our side as well, and also dealing with the uncertainty around COVID-19.

agopu commented 4 years ago

Stephanie clarified in an email that her team does not have the cycles to focus on this at this time, we are going to shut down the prototype for now. Therefore, we will close this ticket. We can always repen it in the future if/when needed.