CHOP-CGTInformatics / REDCapTidieR

Makes it easy to read REDCap Projects into R
https://chop-cgtinformatics.github.io/REDCapTidieR/
Other
33 stars 8 forks source link

Guess max coltypes #144

Closed rsh52 closed 1 year ago

rsh52 commented 1 year ago

Description

This is a small PR addressing a potential issue where read_redcap may incorrectly guess a column type when a large amount of rows are empty or sparsely populated. This relates to default behavior documented in readr:

If you don’t explicit specify column types with the col_types argument, readr will attempt to guess them using some simple heuristics. By default, it will inspect 1000 values, evenly spaced from the first to the last row.

Proposed Changes

List changes below in bullet format:

Issue Addressed

Closes #141

microbenchmark Test Results

The test results from running microbenchmark across all creds in our utility/ folder didn't yield very consistent results:

Click me for comparison results - n of 1 Guess Max Inf |   | Guess Max 1000 |   | Difference | Database Type -- | -- | -- | -- | -- | -- max | neval | max | neval |   |   1.2 | 1 | 1.71 | 1 | -0.51 | simple static (read-only) test project 2.24 | 1 | 2.11 | 1 | 0.13 | longitudinal (read-only) ARM test project 780.97 | 1 | 782.25 | 1 | -1.28 | simple write data 5.22 | 1 | 5.98 | 1 | -0.76 | Russian Characters 8.56 | 1 | 8.41 | 1 | 0.15 | super-wide --3,000 columns 864.01 | 1 | 1.21 | 1 | 862.8 | static (not longitudinal) survey test project 780.11 | 1 | 839.75 | 1 | -59.64 | Clinical Trial (Fake) --Read-only 786.01 | 1 | 1.04 | 1 | 784.97 | nonnumeric record_id 709.49 | 1 | 832.68 | 1 | -123.19 | DAG Read 742.39 | 1 | 1 | 1 | 741.39 | potentially problematic values 768.1 | 1 | 1.06 | 1 | 767.04 | Repeating Instruments 879.26 | 1 | 1.23 | 1 | 878.03 | simple write metadata 815.74 | 1 | 1.11 | 1 | 814.63 | DAG Write -admin 907.86 | 1 | 734.05 | 1 | 173.81 | DAG Write -group A 368.46 | 1 | 374.19 | 1 | -5.73 | super-wide --35,000 columns 762.27 | 1 | 819.12 | 1 | -56.85 | Repeating Instruments --Sparse 692.52 | 1 | 1.06 | 1 | 691.46 | Delete Single Arm 1.13 | 1 | 1.14 | 1 | -0.01 | Delete Multiple Arm 1.1 | 1 | 1.42 | 1 | -0.32 | longitudinal single arm 893.67 | 1 | 1.04 | 1 | 892.63 | decimal comma and dot 715.75 | 1 | 949.61 | 1 | -233.86 | decimal comma 760.67 | 1 | 943.93 | 1 | -183.26 | decimal dot 784.9 | 1 | 1.1 | 1 | 783.8 | Validation Types 783.78 | 1 | 1.14 | 1 | 782.64 | Blank for Gray Status 707.08 | 1 | 1.03 | 1 | 706.05 | Checkboxes 1 773.73 | 1 | 1.05 | 1 | 772.68 | Vignette: Longitudinal & Repeating Measures 2.61 | 1 | 2.7 | 1 | -0.09 | classic 3.43 | 1 | 3 | 1 | 0.43 | classic no repeat 3.51 | 1 | 4.27 | 1 | -0.76 | longitudinal 3 | 1 | 3.98 | 1 | -0.98 | longitudinal no arms 4.26 | 1 | 4.58 | 1 | -0.32 | longitudinal no repeat 5.27 | 1 | 7.98 | 1 | -2.71 | deep dive vignette 1.99 | 1 | 3.07 | 1 | -1.08 | repeat first instrument 3.36 | 1 | 4.69 | 1 | -1.33 | repeat event 2.18 | 1 | 3.81 | 1 | -1.63 | restricted access 8.11 | 1 | 8.97 | 1 | -0.86 | prodigy db 10.64 | 1 | 13.03 | 1 | -2.39 | cart comprehensive db 33.84 | 1 | 30.94 | 1 | 2.9 | bmt outcomes db 2.75 | 1 | 8.25 | 1 | -5.5 | large sparse db
Click me for comparison results - n of 5 Guess Max Inf |   | Guess Max 500 |   | Difference | Database Type -- | -- | -- | -- | -- | -- median | neval | median | neval |   |   855.24 | 5 | 748.23 | 5 | 107.01 | simple static (read-only) test project 1.64 | 5 | 1.59 | 5 | 0.05 | longitudinal (read-only) ARM test project 746.76 | 5 | 775.16 | 5 | -28.4 | simple write data 3.24 | 5 | 3.28 | 5 | -0.04 | Russian Characters 6.8 | 5 | 7.91 | 5 | -1.11 | super-wide --3,000 columns 818.73 | 5 | 793.56 | 5 | 25.17 | static (not longitudinal) survey test project 722.27 | 5 | 732.26 | 5 | -9.99 | Clinical Trial (Fake) --Read-only 678.38 | 5 | 666.4 | 5 | 11.98 | nonnumeric record_id 685.29 | 5 | 702.38 | 5 | -17.09 | DAG Read 661.8 | 5 | 659.16 | 5 | 2.64 | potentially problematic values 726.56 | 5 | 721.93 | 5 | 4.63 | Repeating Instruments 743.85 | 5 | 743.27 | 5 | 0.58 | simple write metadata 656.47 | 5 | 691.02 | 5 | -34.55 | DAG Write -admin 687.78 | 5 | 715.43 | 5 | -27.65 | DAG Write -group A 392.27 | 5 | 392.53 | 5 | -0.26 | super-wide --35,000 columns 773.57 | 5 | 863.59 | 5 | -90.02 | Repeating Instruments --Sparse 732 | 5 | 677.43 | 5 | 54.57 | Delete Single Arm 1.07 | 5 | 1.08 | 5 | -0.01 | Delete Multiple Arm 1.07 | 5 | 1.09 | 5 | -0.02 | longitudinal single arm 704.46 | 5 | 690.17 | 5 | 14.29 | decimal comma and dot 685.99 | 5 | 806.9 | 5 | -120.91 | decimal comma 898.11 | 5 | 706.21 | 5 | 191.9 | decimal dot 825.59 | 5 | 784.48 | 5 | 41.11 | Validation Types 766.95 | 5 | 1137.18 | 5 | -370.23 | Blank for Gray Status 736.23 | 5 | 701.25 | 5 | 34.98 | Checkboxes 1 748.92 | 5 | 794.29 | 5 | -45.37 | Vignette: Longitudinal & Repeating Measures 2.86 | 5 | 2.55 | 5 | 0.31 | classic 2.43 | 5 | 2.27 | 5 | 0.16 | classic no repeat 3.39 | 5 | 3.6 | 5 | -0.21 | longitudinal 3.67 | 5 | 3.32 | 5 | 0.35 | longitudinal no arms 3.3 | 5 | 3.38 | 5 | -0.08 | longitudinal no repeat 4.65 | 5 | 4.82 | 5 | -0.17 | deep dive vignette 2.08 | 5 | 1.96 | 5 | 0.12 | repeat first instrument 3.37 | 5 | 3.34 | 5 | 0.03 | repeat event 2.08 | 5 | 2.54 | 5 | -0.46 | restricted access 2 | 5 | 2.21 | 5 | -0.21 | prodigy db 9.61 | 5 | 10.49 | 5 | -0.88 | cart comprehensive db 11.7 | 5 | 11.36 | 5 | 0.34 | bmt outcomes db 35.76 | 5 | 29.99 | 5 | 5.77 | large sparse db
Click me for comparison results - n of 10, using `.Machine$integer.max` Machine Int Max |   |   |   |   |   |   | Difference | Database Type -- | -- | -- | -- | -- | -- | -- | -- | -- min | lq | mean | median | uq | max | neval |   |   1.05 | 1.1 | 1.22 | 1.14 | 1.24 | 1.82 | 10 | -1291.78 | simple static (read-only) test project 2.21 | 2.24 | 2.42 | 2.28 | 2.34 | 3.64 | 10 | 2.01 | longitudinal (read-only) ARM test project 1.06 | 1.09 | 1.13 | 1.1 | 1.11 | 1.35 | 10 | -1083.12 | simple write data 2.91 | 3.08 | 3.52 | 3.48 | 3.72 | 4.82 | 10 | 1.1 | Russian Characters 5.75 | 5.94 | 6.32 | 6.07 | 6.5 | 7.43 | 10 | -3.66 | super-wide --3,000 columns 1.14 | 1.15 | 1.27 | 1.16 | 1.19 | 2.16 | 10 | -924.41 | static (not longitudinal) survey test project 1.09 | 1.1 | 1.13 | 1.13 | 1.15 | 1.2 | 10 | -805.76 | Clinical Trial (Fake) --Read-only 995.24 | 1006.76 | 1018.3 | 1014.4 | 1036.24 | 1042.6 | 10 | 370.35 | nonnumeric record_id 1.04 | 1.05 | 1.05 | 1.05 | 1.05 | 1.06 | 10 | -738.03 | DAG Read 993.61 | 998.68 | 1016.87 | 1006.58 | 1050.75 | 1056.32 | 10 | 348.67 | potentially problematic values 1.06 | 1.09 | 1.09 | 1.09 | 1.11 | 1.12 | 10 | -732.49 | Repeating Instruments 1.06 | 1.09 | 1.14 | 1.1 | 1.12 | 1.36 | 10 | -999.71 | simple write metadata 1.02 | 1.05 | 1.08 | 1.06 | 1.08 | 1.25 | 10 | -700.31 | DAG Write -admin 1 | 1.03 | 1.09 | 1.05 | 1.06 | 1.41 | 10 | -982.56 | DAG Write -group A 311.63 | 316.24 | 325.79 | 325.25 | 334.75 | 345.6 | 10 | -57.49 | super-wide --35,000 columns 1.02 | 1.04 | 1.06 | 1.05 | 1.08 | 1.13 | 10 | -1088.92 | Repeating Instruments --Sparse 956.78 | 995.96 | 1023.7 | 1008.13 | 1049.78 | 1127.14 | 10 | 136.95 | Delete Single Arm 1.79 | 1.8 | 1.87 | 1.81 | 1.84 | 2.22 | 10 | 1.1 | Delete Multiple Arm 1.74 | 1.75 | 1.84 | 1.78 | 1.82 | 2.39 | 10 | 1.01 | longitudinal single arm 1.01 | 1.04 | 1.04 | 1.05 | 1.05 | 1.06 | 10 | -712.8 | decimal comma and dot 983.67 | 1020.67 | 1056.7 | 1042.14 | 1117.88 | 1148.12 | 10 | 134.45 | decimal comma 968.38 | 995.09 | 1019.78 | 1000.14 | 1044.42 | 1148.79 | 10 | 399.11 | decimal dot 1.05 | 1.06 | 1.09 | 1.08 | 1.1 | 1.14 | 10 | -991.05 | Validation Types 1.11 | 1.11 | 1.14 | 1.12 | 1.15 | 1.29 | 10 | -1959.75 | Blank for Gray Status 1 | 1.04 | 1.04 | 1.05 | 1.06 | 1.09 | 10 | -1114.52 | Checkboxes 1 1.04 | 1.07 | 1.1 | 1.07 | 1.17 | 1.21 | 10 | -1047.2 | Vignette: Longitudinal & Repeating Measures 5.41 | 6.65 | 7.52 | 7.29 | 7.95 | 10.67 | 10 | 7.15 | classic 3.79 | 4.57 | 5.32 | 5.26 | 5.91 | 7.28 | 10 | 4.75 | classic no repeat 3.52 | 3.62 | 4.59 | 3.98 | 5.19 | 9.03 | 10 | 5.16 | longitudinal 3.5 | 3.75 | 3.92 | 3.87 | 4.03 | 4.47 | 10 | 0.97 | longitudinal no arms 3.3 | 3.76 | 3.83 | 3.85 | 3.89 | 4.2 | 10 | 0.4 | longitudinal no repeat 4.71 | 4.94 | 5.15 | 5.14 | 5.33 | 5.63 | 10 | 0.49 | deep dive vignette 2.04 | 2.06 | 2.2 | 2.1 | 2.35 | 2.49 | 10 | 0.28 | repeat first instrument 3.59 | 3.76 | 4.04 | 3.94 | 4.19 | 4.83 | 10 | 1.42 | repeat event 2.23 | 2.3 | 2.47 | 2.41 | 2.57 | 2.94 | 10 | 0.13 | restricted access 1.98 | 2.09 | 2.26 | 2.19 | 2.35 | 2.96 | 10 | 0.66 | prodigy db 8.18 | 8.5 | 8.75 | 8.77 | 8.97 | 9.15 | 10 | -5.67 | cart comprehensive db 10.65 | 10.72 | 11.31 | 10.93 | 11.61 | 13.05 | 10 | 0.76 | bmt outcomes db 28.8 | 29.15 | 30.29 | 30.37 | 30.65 | 33.99 | 10 | 3.15 | large sparse db

Some databases ran much slower while others ran much faster, but these large changes were all related to the OUHSC databases. The only pattern I can see is that this change sped up almost all of our test databases by a small amount (the last 13 rows) except for the BMT Outcomes database.

PR Checklist

Before submitting this PR, please check and verify below that the submission meets the below criteria:

Code Review

This section to be used by the reviewer and developers during Code Review after PR submission

Code Review Checklist

skadauke commented 1 year ago

I love the benchmark table! I'd recommend you run tests at least 10 times (neval=10) to get more consistent results. S


From: Rich Hanna @.> Sent: Friday, March 31, 2023 12:11 PM To: CHOP-CGTInformatics/REDCapTidieR @.> Cc: Subscribed @.***> Subject: [External][CHOP-CGTInformatics/REDCapTidieR] Guess max coltypes (PR #144)

Description

This is a small PR addressing a potential issue where read_redcap may incorrectly guess a column type when a large amount of rows are empty or sparsely populated. This relates to default behavior documented in readrhttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Freadr.tidyverse.org%2Farticles%2Fcolumn-types.html&data=05%7C01%7Ckadaukes%40chop.edu%7Cd3b1bf049ee543c9a54808db3202a5eb%7Ca611241607b041a59bb1d146b575c975%7C0%7C0%7C638158759175771356%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=NM2ToAN1dWLNFLKbTXnnvfycZQu4XB5xn7IVhwO4fvA%3D&reserved=0:

If you don’t explicit specify column types with the col_types argument, readr will attempt to guess them using some simple heuristics. By default, it will inspect 1000 values, evenly spaced from the first to the last row.

Proposed Changes

List changes below in bullet format:

Issue Addressed

Closes #141https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FCHOP-CGTInformatics%2FREDCapTidieR%2Fissues%2F141&data=05%7C01%7Ckadaukes%40chop.edu%7Cd3b1bf049ee543c9a54808db3202a5eb%7Ca611241607b041a59bb1d146b575c975%7C0%7C0%7C638158759175927592%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=97IfIO2x1Heqm9lskKpt5kQpWG7V3EeakIMemYsI0ZY%3D&reserved=0

microbenchmark Test Results

The test results from running microbenchmark across all creds in our utility/ folder didn't yield very consistent results:

Click me for comparison results Guess Max Inf Guess Max 1000 Difference Database Type max neval max neval 1.2 1 1.71 1 -0.51 simple static (read-only) test project 2.24 1 2.11 1 0.13 longitudinal (read-only) ARM test project 780.97 1 782.25 1 -1.28 simple write data 5.22 1 5.98 1 -0.76 Russian Characters 8.56 1 8.41 1 0.15 super-wide --3,000 columns 864.01 1 1.21 1 862.8 static (not longitudinal) survey test project 780.11 1 839.75 1 -59.64 Clinical Trial (Fake) --Read-only 786.01 1 1.04 1 784.97 nonnumeric record_id 709.49 1 832.68 1 -123.19 DAG Read 742.39 1 1 1 741.39 potentially problematic values 768.1 1 1.06 1 767.04 Repeating Instruments 879.26 1 1.23 1 878.03 simple write metadata 815.74 1 1.11 1 814.63 DAG Write -admin 907.86 1 734.05 1 173.81 DAG Write -group A 368.46 1 374.19 1 -5.73 super-wide --35,000 columns 762.27 1 819.12 1 -56.85 Repeating Instruments --Sparse 692.52 1 1.06 1 691.46 Delete Single Arm 1.13 1 1.14 1 -0.01 Delete Multiple Arm 1.1 1 1.42 1 -0.32 longitudinal single arm 893.67 1 1.04 1 892.63 decimal comma and dot 715.75 1 949.61 1 -233.86 decimal comma 760.67 1 943.93 1 -183.26 decimal dot 784.9 1 1.1 1 783.8 Validation Types 783.78 1 1.14 1 782.64 Blank for Gray Status 707.08 1 1.03 1 706.05 Checkboxes 1 773.73 1 1.05 1 772.68 Vignette: Longitudinal & Repeating Measures 2.61 1 2.7 1 -0.09 classic 3.43 1 3 1 0.43 classic no repeat 3.51 1 4.27 1 -0.76 longitudinal 3 1 3.98 1 -0.98 longitudinal no arms 4.26 1 4.58 1 -0.32 longitudinal no repeat 5.27 1 7.98 1 -2.71 deep dive vignette 1.99 1 3.07 1 -1.08 repeat first instrument 3.36 1 4.69 1 -1.33 repeat event 2.18 1 3.81 1 -1.63 restricted access 8.11 1 8.97 1 -0.86 prodigy db 10.64 1 13.03 1 -2.39 cart comprehensive db 33.84 1 30.94 1 2.9 bmt outcomes db 2.75 1 8.25 1 -5.5 large sparse db

Some databases ran much slower while others ran much faster, but these large changes were all related to the OUHSC databases. The only pattern I can see is that this change sped up almost all of our test databases by a small amount (the last 13 rows) except for the BMT Outcomes database.

PR Checklist

Before submitting this PR, please check and verify below that the submission meets the below criteria:

Code Review

This section to be used by the reviewer and developers during Code Review after PR submission

Code Review Checklist


You can view, comment on, or merge this pull request online at:

https://github.com/CHOP-CGTInformatics/REDCapTidieR/pull/144https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FCHOP-CGTInformatics%2FREDCapTidieR%2Fpull%2F144&data=05%7C01%7Ckadaukes%40chop.edu%7Cd3b1bf049ee543c9a54808db3202a5eb%7Ca611241607b041a59bb1d146b575c975%7C0%7C0%7C638158759175927592%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=3vArJhIghCUtzsrYn5LbNHithxlBgReGxuIeZt9BHWw%3D&reserved=0

Commit Summary

File Changes

(39 fileshttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FCHOP-CGTInformatics%2FREDCapTidieR%2Fpull%2F144%2Ffiles&data=05%7C01%7Ckadaukes%40chop.edu%7Cd3b1bf049ee543c9a54808db3202a5eb%7Ca611241607b041a59bb1d146b575c975%7C0%7C0%7C638158759175927592%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=OhZk4Ngb8gsGvN2IyMO3SoQ7jKo8Fr3r7e1AvAMcaQI%3D&reserved=0)

Patch Links:

— Reply to this email directly, view it on GitHubhttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FCHOP-CGTInformatics%2FREDCapTidieR%2Fpull%2F144&data=05%7C01%7Ckadaukes%40chop.edu%7Cd3b1bf049ee543c9a54808db3202a5eb%7Ca611241607b041a59bb1d146b575c975%7C0%7C0%7C638158759176396319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=rBvSsH%2FgzE4T8QR2eI%2FR9%2BS4oQM9Xg2fwvetE6plUFs%3D&reserved=0, or unsubscribehttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACTGHWTZOOXSRHLS553ZGQ3W6364RANCNFSM6AAAAAAWO3KKBM&data=05%7C01%7Ckadaukes%40chop.edu%7Cd3b1bf049ee543c9a54808db3202a5eb%7Ca611241607b041a59bb1d146b575c975%7C0%7C0%7C638158759176396319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=lMHARFUU1xmFU9cPyEZiOK1JgSDPuErs47OJrU51Pt0%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>

This email originated from an EXTERNAL sender to CHOP. Proceed with caution when replying, opening attachments, or clicking links. Do not disclose your CHOP credentials, employee information, or protected health information to a potential hacker.

rsh52 commented 1 year ago

I love the benchmark table! I'd recommend you run tests at least 10 times (neval=10) to get more consistent results. S

Sure, kicked up a few R background jobs for this. FYI running across everything once takes ~10-15 minutes. So running all of these 10 times is likely to take in the range of hours. Will post updates when it finishes

skadauke commented 1 year ago

Can you multithread?


From: Rich Hanna @.> Sent: Friday, March 31, 2023 2:01 PM To: CHOP-CGTInformatics/REDCapTidieR @.> Cc: Stephan Kadauke @.>; Comment @.> Subject: [External]Re: [CHOP-CGTInformatics/REDCapTidieR] Guess max coltypes (PR #144)

I love the benchmark table! I'd recommend you run tests at least 10 times (neval=10) to get more consistent results. S

Sure, kicked up a few R background jobs for this. FYI running across everything once takes ~10-15 minutes. So running all of these 10 times is likely to take in the range of hours. Will post updates when it finishes

— Reply to this email directly, view it on GitHubhttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FCHOP-CGTInformatics%2FREDCapTidieR%2Fpull%2F144%23issuecomment-1492388901&data=05%7C01%7Ckadaukes%40chop.edu%7Cc231abba87bb4322adbf08db3211eb9c%7Ca611241607b041a59bb1d146b575c975%7C0%7C0%7C638158824765352917%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=oS9UJ0x6if0cpw7Pu8%2F%2FXlgG8MFCPC2PTF93ZQwfz6w%3D&reserved=0, or unsubscribehttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACTGHWSZ4POM7WGFBU632NLW64LWRANCNFSM6AAAAAAWO3KKBM&data=05%7C01%7Ckadaukes%40chop.edu%7Cc231abba87bb4322adbf08db3211eb9c%7Ca611241607b041a59bb1d146b575c975%7C0%7C0%7C638158824765352917%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=b06aNydxGkI4bcfqg6mbkLP6nhyWRr5slhzQ91T%2FHU8%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>

This email originated from an EXTERNAL sender to CHOP. Proceed with caution when replying, opening attachments, or clicking links. Do not disclose your CHOP credentials, employee information, or protected health information to a potential hacker.

rsh52 commented 1 year ago

I did an n of 5 to help with the time to execute, I can look into multithread if we want to run these again but judging by these results against our databases I think things look pretty good. I admittedly typo'd and ran a guess max of 500 instead of 1000 but I don't think it matters much here.

ezraporter commented 1 year ago

@rsh52 I think I found the issues with the mocks. The setup-r-dependencies step in our CI upgrades all packages to the latest CRAN version so the CI is using the CRAN version of REDCapR rather than the dev version: https://github.com/CHOP-CGTInformatics/REDCapTidieR/actions/runs/4597505666/jobs/8120296229?pr=144#step:7:2103

I think we want the CI to be using the dev version anyways so once that's resolved the mocks should work too.

rsh52 commented 1 year ago

@rsh52 I think I found the issues with the mocks. The setup-r-dependencies step in our CI upgrades all packages to the latest CRAN version so the CI is using the CRAN version of REDCapR rather than the dev version: https://github.com/CHOP-CGTInformatics/REDCapTidieR/actions/runs/4597505666/jobs/8120296229?pr=144#step:7:2103

I think we want the CI to be using the dev version anyways so once that's resolved the mocks should work too.

Ah good catch, I'll run that update soon.

rsh52 commented 1 year ago

Third microbenchmark test results up for review, again seeing that there is minimal change with our databases, but some change with Will's.

rsh52 commented 1 year ago

@skadauke Awesome, agreed. I will also reach out to the OP on that issue to have them test.

skadauke commented 1 year ago

Excellent! TY


From: Rich Hanna @.> Sent: Monday, April 3, 2023 4:48 PM To: CHOP-CGTInformatics/REDCapTidieR @.> Cc: Stephan Kadauke @.>; Mention @.> Subject: [External]Re: [CHOP-CGTInformatics/REDCapTidieR] Guess max coltypes (PR #144)

@skadaukehttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fskadauke&data=05%7C01%7Ckadaukes%40chop.edu%7C5de588a89b054d54510508db3484d894%7Ca611241607b041a59bb1d146b575c975%7C0%7C0%7C638161517415769764%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=szFViNvdMhC%2F%2FoHwzh2xBOJ3vOp2B2SQQQz7slub%2BqU%3D&reserved=0 Awesome, agreed. I will also reach out to the OP on that issue to have them test.

— Reply to this email directly, view it on GitHubhttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FCHOP-CGTInformatics%2FREDCapTidieR%2Fpull%2F144%23issuecomment-1494961336&data=05%7C01%7Ckadaukes%40chop.edu%7C5de588a89b054d54510508db3484d894%7Ca611241607b041a59bb1d146b575c975%7C0%7C0%7C638161517415769764%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Nsfp%2BKYIyuiLE8CJ1pUAPHiU6X786Gvzm5Hyx%2B9iMrM%3D&reserved=0, or unsubscribehttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACTGHWT2YTRKFGS2CDUWKQLW7MZTPANCNFSM6AAAAAAWO3KKBM&data=05%7C01%7Ckadaukes%40chop.edu%7C5de588a89b054d54510508db3484d894%7Ca611241607b041a59bb1d146b575c975%7C0%7C0%7C638161517415769764%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=I8kJl%2Fzu12uPhEEolBPWhIIPW5EHtfTm30geQ2svDQc%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

This email originated from an EXTERNAL sender to CHOP. Proceed with caution when replying, opening attachments, or clicking links. Do not disclose your CHOP credentials, employee information, or protected health information to a potential hacker.