Collection of WDL workflows based off the University of Washington TOPMed DCC Best Practices for GWAS. The WDL structure was based upon CWLs written by the Seven Bridges development team.
6
stars
3
forks
source link
[assoc-agg] weight_chr2.RData is both equivalent and not equivalent to its checker file #70
This is related to #22 on the checker template repo and #69 on this repo. When the assoc-agg checker workflow imports v1.0.0 of the checker, the weights check errors out. The files have different md5 sums, but R considers them perfectly equivalent.
I don't care about the fact it ultimately errored out, that's already been addressed in #69. I want to know why R is considering these files equivalent if they are not, in fact, equivalent.
md5
The test file has an md5 of a454372a474881dd9d8b43953a4ebc10.
The truth file has an md5 of 7c16c41dfb801affc65c762aaadea32a.
Hypothesis?
I have a hunch that the truth file was generated with topmed-master version 2.10 instead of version 2.12. This would explain why I found this bug last-minute -- I updated the Dockers very late in development as I didn't realize there was a newer version.
Relevant part of log, from Terra
/cromwell_root/fc-e860f7d8-0013-41a0-b74a-5fd0c86a128b/9bd677d1-c79d-4d7a-970d-f23f7355dbde/aggie_checker/9fcec69c-7758-4a64-8eb4-54a2bd410c9b/call-weights_run/assoc_agg/21f9df81-efa0-4a71-8be2-57c3cc0388da/call-assoc_combine_r/shard-1/cacheCopy/glob-102d2e89c0518e38f77aa349b30c2214/weights_chr2.RData does not match expected truth file /cromwell_root/topmed_workflow_testing/UWGAC_WDL/checker/assoc/aggregate/weights_chr2.RData
Calling Rscript to check for functional equivalence...
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 11 (bullseye)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.1.0
[1] "/cromwell_root/fc-e860f7d8-0013-41a0-b74a-5fd0c86a128b/9bd677d1-c79d-4d7a-970d-f23f7355dbde/aggie_checker/9fcec69c-7758-4a64-8eb4-54a2bd410c9b/call-weights_run/assoc_agg/21f9df81-efa0-4a71-8be2-57c3cc0388da/call-assoc_combine_r/shard-1/cacheCopy/glob-102d2e89c0518e38f77aa349b30c2214/weights_chr2.RData"
[2] "/cromwell_root/topmed_workflow_testing/UWGAC_WDL/checker/assoc/aggregate/weights_chr2.RData"
[3] "1.0E-8"
Error: isFALSE(identical(test, truth)) is not TRUE
Execution halted
Test file varies beyond accepted tolerance of 1.0E-8. FAIL
FAIL
This is related to #22 on the checker template repo and #69 on this repo. When the assoc-agg checker workflow imports v1.0.0 of the checker, the weights check errors out. The files have different md5 sums, but R considers them perfectly equivalent.
I don't care about the fact it ultimately errored out, that's already been addressed in #69. I want to know why R is considering these files equivalent if they are not, in fact, equivalent.
Proof of Concept Files
md5 The test file has an md5 of a454372a474881dd9d8b43953a4ebc10. The truth file has an md5 of 7c16c41dfb801affc65c762aaadea32a.
Hypothesis? I have a hunch that the truth file was generated with topmed-master version 2.10 instead of version 2.12. This would explain why I found this bug last-minute -- I updated the Dockers very late in development as I didn't realize there was a newer version.
Relevant part of log, from Terra