Closed yongrenjie closed 7 months ago
@helendduncan I'm so sorry this is a bit of a mammoth PR. I should definitely have put stuff in one at a time. I did try to explain the changes in the comment and I suppose we have chatted about this... But still I recognise the code review is ugly 😓
let me know if you want to go through this together at any point in time.
By far the most important code changes are R/featurise_lookup.R
, R/featurise_present.R
, and R/validation.R
, the rest are mostly tweaks just to fix function signatures or stuff like that
Hi @yongrenjie - I'm going to make a few comments here just to make sure that I understand everything - not because I think there's anything missing from the code, but when I merge in the vignettes and docs stuff I want to make sure I understand what's happening. So sorry if you get a million messages - they don't mean anything bad!! :P
@yongrenjie I really like the all_id
s #59 option and it's utility to add users who aren't present (like in the test case) what do you think about putting a trace message in there about either (some/list) IDs not present in original table or a more generic statement which states that the original IDs may not have been present in the initial data?
validate.R
will throw error messages when there is discrepancy between the types of filter value in the spec, and the specified column in the data. It will also throw an error if the 'absent_default_value' is not an integer - where the error also notes its optional nature. However there is no trace or debug error about if one isn't provided that a default would be used - should this be added in?
Having discussed this with you I agree it's a good idea to make new issues of the comments and merge this wonderful PR
opened new issues #72 and #73, I'll take care of them!
Renaming stuff
absent_data_flag
->absent_default_value
as per discussionsgrouping_columns
->grouping_column
(Closes #62)New featurisers
(Closes #49)
Adds
featurise_present()
, which determines the presence (1) or absence (0) of a row satisfying some filter. For example, this spec:https://github.com/alan-turing-institute/SPARRA/blob/55523e09640ee052271e8d67ca84f503af46d464/eider/tests/spec/test_present.json#L1-L27
determines whether each patient has ever been admitted with a diagnosis code of 101.
(Closes #60)
Adds
featurise_lookup()
, which takes a spec like this:https://github.com/alan-turing-institute/SPARRA/blob/3c10cf5ffe9bc904319bb7a59d9b94fc6dc50d91/eider/tests/spec/test_lookup.json#L1-L8
and copies the
diagnosis_1
column from theae2
table into a new feature calleddiag_101_first
.Other usability things
(Closes #59)
Allow the user to optionally specify a list of IDs which must be present, and must be the only IDs present, in the final feature table. Example usage:
https://github.com/alan-turing-institute/SPARRA/blob/3c784ed339650b5629016f0a6b7ba52ba4c58cec/eider/tests/testthat/test-transform_more_ids.R#L2-L13
Added a couple sentences to the docs here and there
Input validation
(Closes #31)
Checks pretty much every JSON field to make sure that it has the correct type.