DataBiosphere / analysis_pipeline_WDL

Collection of WDL workflows based off the University of Washington TOPMed DCC Best Practices for GWAS. The WDL structure was based upon CWLs written by the Seven Bridges development team.
6 stars 3 forks source link

[assoc-agg] weight_chr2.RData is both equivalent and not equivalent to its checker file #70

Open aofarrel opened 2 years ago

aofarrel commented 2 years ago

This is related to #22 on the checker template repo and #69 on this repo. When the assoc-agg checker workflow imports v1.0.0 of the checker, the weights check errors out. The files have different md5 sums, but R considers them perfectly equivalent.

I don't care about the fact it ultimately errored out, that's already been addressed in #69. I want to know why R is considering these files equivalent if they are not, in fact, equivalent.

Proof of Concept Files

md5 The test file has an md5 of a454372a474881dd9d8b43953a4ebc10. The truth file has an md5 of 7c16c41dfb801affc65c762aaadea32a.

Hypothesis? I have a hunch that the truth file was generated with topmed-master version 2.10 instead of version 2.12. This would explain why I found this bug last-minute -- I updated the Dockers very late in development as I didn't realize there was a newer version.

Relevant part of log, from Terra

/cromwell_root/fc-e860f7d8-0013-41a0-b74a-5fd0c86a128b/9bd677d1-c79d-4d7a-970d-f23f7355dbde/aggie_checker/9fcec69c-7758-4a64-8eb4-54a2bd410c9b/call-weights_run/assoc_agg/21f9df81-efa0-4a71-8be2-57c3cc0388da/call-assoc_combine_r/shard-1/cacheCopy/glob-102d2e89c0518e38f77aa349b30c2214/weights_chr2.RData does not match expected truth file /cromwell_root/topmed_workflow_testing/UWGAC_WDL/checker/assoc/aggregate/weights_chr2.RData Calling Rscript to check for functional equivalence... R version 4.1.0 (2021-05-18) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 11 (bullseye)

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.13.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached): [1] compiler_4.1.0 [1] "/cromwell_root/fc-e860f7d8-0013-41a0-b74a-5fd0c86a128b/9bd677d1-c79d-4d7a-970d-f23f7355dbde/aggie_checker/9fcec69c-7758-4a64-8eb4-54a2bd410c9b/call-weights_run/assoc_agg/21f9df81-efa0-4a71-8be2-57c3cc0388da/call-assoc_combine_r/shard-1/cacheCopy/glob-102d2e89c0518e38f77aa349b30c2214/weights_chr2.RData" [2] "/cromwell_root/topmed_workflow_testing/UWGAC_WDL/checker/assoc/aggregate/weights_chr2.RData" [3] "1.0E-8" Error: isFALSE(identical(test, truth)) is not TRUE Execution halted Test file varies beyond accepted tolerance of 1.0E-8. FAIL FAIL