carissalow / rapids

Reproducible Analysis Pipeline for Data Streams
http://www.rapids.science/
GNU Affero General Public License v3.0
37 stars 20 forks source link

Fixes issue where 'duration' in the 'ios_calls' dataframe is seen as … #171

Closed shaabans closed 2 years ago

shaabans commented 2 years ago

…a character type.

We see this error when computing the PHONE_CALLS feature (confirmed that the call table's duration and type are valid):

[Sat Jan 15 18:27:08 2022]
rule pull_phone_data:
    input: data/external/participant_files/710CL.yaml, src/data/streams/rapids_columns.yaml, src/data/streams/aware_mysql/format.yaml, src/data/streams/aware_mysql/container.R, src/data/streams/mutations/phone/aware/calls_ios_unification.R
    output: data/raw/710CL/phone_calls_raw.csv
    jobid: 2
    wildcards: pid=710CL, sensor=calls

Warning message:
Project requested R version '4.0.0' but '4.1.2' is currently being used

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Processing PHONE_CALLS for e2a7ad3c-7a05-4c18-bac3-ef017ab94d38
Executing the following query to download data: SELECT timestamp,device_id,call_duration,trace,call_type FROM calls WHERE device_id = 'e2a7ad3c-7a05-4c18-bac3-ef017ab94d38'
Applying mutation script src/data/streams/mutations/phone/aware/calls_ios_unification.R
Error: Problem with `summarise()` input `call_duration`.
✖ invalid 'type' (character) of argument
ℹ Input `call_duration` is `sum(call_duration)`.
ℹ The error occurred in group 1: trace = "0024A27F-F835-4D1C-BDE0-228BC8E8A9CC".
Backtrace:
     █
  1. ├─global::pull_phone_data()
  2. │ └─global::mutate_data(mutation_scripts, renamed_data, data_configuration)
  3. │   └─main(data, data_configuration)
  4. │     └─unify_ios_calls(data)
  5. │       └─`%>%`(...)
  6. ├─dplyr::summarise(...)
  7. ├─dplyr:::summarise.grouped_df(...)
  8. │ └─dplyr:::summarise_cols(.data, ...)
  9. │   ├─base::withCallingHandlers(...)
 10. │   └─mask$eval_all_summarise(quo)
 11. └─base::.handleSimpleError(...)
 12.   └─dplyr:::h(simpleError(msg, call))
Execution halted
[Sat Jan 15 18:27:10 2022]
Error in rule pull_phone_data:
    jobid: 2
    output: data/raw/710CL/phone_calls_raw.csv

RuleException:
CalledProcessError in line 24 of /rapids/rules/preprocessing.smk:
Command 'set -euo pipefail;  Rscript --vanilla /rapids/.snakemake/scripts/tmp5orz_bq7.pull_phone_data.R' returned non-zero exit status 1.
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2339, in run_wrapper
  File "/rapids/rules/preprocessing.smk", line 24, in __rule_pull_phone_data
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 560, in _callback
  File "/opt/conda/envs/rapids/lib/python3.7/concurrent/futures/thread.py", line 57, in run
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 546, in cached_or_run
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2351, in run_wrapper
Shutting down, this might take some time.