When processing Fitbit heartrate summary data for a particular device from a single participant using the Fitbit JSON MySQL data stream, we encountered the following error when executing the pull_wearable_data rule:
rule pull_wearable_data:
input: data/external/participant_files/p1170.yaml, src/data/streams/rapids_columns.yaml, src/data/streams/fitbitjson_mysql/format.yaml, src/data/streams/fitbitjson_mysql/container.R, src/data/streams/mutations/fitbit/parse_heartrate_summary_json.py, src/data/streams/mutations/fitbit/add_zero_timestamp.py
output: data/raw/p1170/fitbit_heartrate_summary_raw.csv
jobid: 1
wildcards: pid=p1170, device_type=fitbit, sensor=heartrate_summary
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Warning message:
package ‘readr’ was built under R version 4.0.5
Processing FITBIT_HEARTRATE_SUMMARY for cf0992de-be2e-4070-ac6c-2f71f857aab0
Executing the following query to download data: SELECT device_id,fitbit_data FROM fitbit_data_from_api_v2 WHERE device_id = 'cf0992de-be2e-4070-ac6c-2f71f857aab0'
Applying mutation script src/data/streams/mutations/fitbit/parse_heartrate_summary_json.py
Error in `mutate_cols()`:
! Problem with `mutate()` input `..1`.
✖ missing value where TRUE/FALSE needed
ℹ Input `..1` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.
Caused by error in `if (!is.character(value) && !is.nan(value)) ...`:
! missing value where TRUE/FALSE needed
Backtrace:
▆
1. ├─global mutate_data(mutation_scripts, renamed_data, data_configuration)
2. │ └─data %>% ...
3. ├─dplyr::mutate(., across(where(is.list), fix_pandas_nan_in_string_columns))
4. ├─dplyr:::mutate.data.frame(., across(where(is.list), fix_pandas_nan_in_string_columns))
5. │ └─dplyr:::mutate_cols(.data, ...)
6. │ ├─base::withCallingHandlers(...)
7. │ └─mask$eval_all_mutate(quo)
8. ├─global `<fn>`(heartrate_daily_restinghr)
9. │ └─base::vapply(...)
10. │ └─FUN(X[[i]], ...)
11. └─base::.handleSimpleError(...)
12. └─dplyr (local) h(simpleError(msg, call))
13. └─rlang::abort(...)
Execution halted
We are using RAPIDS v1.9.4 running on Ubuntu 20.04. It seems the error is caused by the use of None to represent missing values in the src/data/streams/mutations/fitbit/parse_heartrate_summary_json.py mutation script, which is executed within the src/data/streams/pull_wearable_data.R script via {reticulate}. In the python script, missing values for expected columns are set to None. None values in a pandas series (e.g., a DataFrame column) are normally coerced to NaN when other numeric values are present, and python's NaN is also interpreted as NaN within R. However, this device for this participant had only one row of Fitbit heartrate summary data and a missing value for heartrate_daily_restinghr which was set to None. Because there were no other numeric values present in that column, this value of None is not coerced to NaN and is interpreted by R as NULL. Evaluating NULL with !is.nan() returns a logical vector of length 0 rather than a TRUE or FALSE as expected, resulting in this error. To account for this, we can replace any instances of None in the mutation script with np.NaN.
When processing Fitbit heartrate summary data for a particular device from a single participant using the Fitbit JSON MySQL data stream, we encountered the following error when executing the
pull_wearable_data
rule:We are using RAPIDS v1.9.4 running on Ubuntu 20.04. It seems the error is caused by the use of
None
to represent missing values in thesrc/data/streams/mutations/fitbit/parse_heartrate_summary_json.py
mutation script, which is executed within thesrc/data/streams/pull_wearable_data.R
script via{reticulate}
. In the python script, missing values for expected columns are set toNone
.None
values in a pandas series (e.g., a DataFrame column) are normally coerced toNaN
when other numeric values are present, and python'sNaN
is also interpreted asNaN
within R. However, this device for this participant had only one row of Fitbit heartrate summary data and a missing value forheartrate_daily_restinghr
which was set toNone
. Because there were no other numeric values present in that column, this value ofNone
is not coerced toNaN
and is interpreted by R asNULL
. EvaluatingNULL
with!is.nan()
returns a logical vector of length 0 rather than aTRUE
orFALSE
as expected, resulting in this error. To account for this, we can replace any instances ofNone
in the mutation script withnp.NaN
.