carissalow / rapids

Reproducible Analysis Pipeline for Data Streams
http://www.rapids.science/
GNU Affero General Public License v3.0
37 stars 20 forks source link

Outdated cli version #194

Open zacharyfried opened 1 year ago

zacharyfried commented 1 year ago

We get an error when running snakemake -j1 create_participants_files

Select jobs to execute...                                                                                                                         

[Tue Oct 18 11:26:38 2022]                                                                                                                        
rule create_participants_files:                                                                                                                   
    input: data/external/participant_file_modified.csv                                                                                            
    jobid: 0                                                                                                                                      
    resources: tmpdir=/tmp                                                                                                                        

Error: package or namespace load failed for ‘readr’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):              
 namespace ‘cli’ 2.2.0 is already loaded, but >= 2.3.0 is required                                                                                
In addition: Warning message:                                                                                                                     
replacing previous import ‘lifecycle::last_warnings’ by ‘rlang::last_warnings’ when loading ‘pillar’                                              
Execution halted                                                                                                                                  
[Tue Oct 18 11:26:40 2022]                                                                                                                        
Error in rule create_participants_files:                                                                                                          
    jobid: 0                                                                                                                                      

RuleException:                                                                                                                                    
CalledProcessError in line 13 of /home/zfried/work_dir/clean_rapids/rapids/rules/preprocessing.smk:                                               
Command 'set -euo pipefail;  Rscript --vanilla /home/zfried/work_dir/clean_rapids/rapids/.snakemake/scripts/tmpj2a0z5ta.create_participants_files.
R' returned non-zero exit status 1.                                                                                                               
  File "/home/zfried/work_dir/clean_rapids/rapids/rules/preprocessing.smk", line 13, in __rule_create_participants_files                          
  File "/data/anaconda2/envs/rapids_r4_0/lib/python3.7/concurrent/futures/thread.py", line 57, in run                                             
Shutting down, this might take some time.                                                                                                         
Exiting because a job execution failed. Look above for error message                                                                              
Complete log: .snakemake/log/2022-10-18T112637.093754.snakemake.log                          

The renv.lock file specifies version 2.2.0. Can this be upgraded? We are using version 3.4.1 which seems to work fine, but note that our R version is more recent than the recommended R 4.0.0.

Install Details

Using R 4.0.5 and Ubuntu 18.04 on commit d255f2de8d2589b497ff1be28a80761c87b89fdd (July 7th, 2022)

JulioV commented 1 year ago

Thanks for reporting this, we'll update the renv lock as soon as possible. In the meantime feel free to update the problematic R packages manually

JulioV commented 1 year ago

@jenniferfedor before we release the next version can we update all the files in the renv lock, please? The easiest I think is to run renv::update. @zacharyfried reported a possible bug when updating readr to the latest so we will have to check the tests to make sure they are passing

jenniferfedor commented 1 year ago

Hi @JulioV, updates to cli and some dependencies have been addressed in this pull request. Specifically we updated:

All of our tests pass with those minimally necessary updates. Updating all of the packages in renv.lock with renv::update() does cause some of our tests to fail. For example, for our single timezone frequency time segment tests, this test for Fitbit sleep intraday fails:

Comparing data/processed/features/fitbit/fitbit_sleep_intraday.csv and tests/data/processed/features/stz_frequency/fitbit/fitbit_sleep_intraday.csv
F
======================================================================
FAIL: test_sensors_features_calculations (__main__.TestStzFrequency)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/scripts/run_tests.py", line 60, in test_sensors_features_calculations
    pd.testing.assert_frame_equal(df_exp, df_act, obj="df_exp")
  File "/home/jen/.conda/envs/rapids/lib/python3.7/site-packages/pandas/_testing.py", line 1562, in assert_frame_equal
    obj, f"{obj} shape mismatch", f"{repr(left.shape)}", f"{repr(right.shape)}",
  File "/home/jen/.conda/envs/rapids/lib/python3.7/site-packages/pandas/_testing.py", line 1036, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: df_exp are different

df_exp shape mismatch
[left]:  (144, 370)
[right]: (142, 370)

----------------------------------------------------------------------
Ran 2 tests in 1.862s

FAILED (failures=1)

Weirdly, (1) no errors occur during data processing and (2) we compute Fitbit sleep intraday features in a Python script, so I assume the underlying problem is occurring at an earlier step in our pipeline, but I will need some additional time to investigate.

In the interest of time, is it okay to leave this overall update on our to-do list for the next new release? We are hoping to release a new version as soon as possible so that we can re-process phone applications foreground features for one of our collaborative studies. Thank you!

JulioV commented 1 year ago

Sure, no problem, let's do this for our next release. The discrepancy probably comes from the script that parses timestamps and time zones