atorus-research / xportr

Tools to build CDISC compliant data sets and check for CDISC compliance.
https://atorus-research.github.io/xportr/
Other
41 stars 9 forks source link

Update documentation to include metatools select function #43

Open bms63 opened 2 years ago

bms63 commented 2 years ago

It would be great to have a wrapper around the dplyr::select to subset the dataset down to what is in the spec and give a message on what has been selected and what has been dropped. This could just be numbers of variables selected/dropped and which ones were dropped.

right now i do:

 adds %>% 
  select(adds_spec$variable) %>% 
  xportr_order(adds_spec) %>% 
  xportr_label(adds_spec) %>% 
  xportr_format(adds_spec)

but i would like:

 adds %>% 
   xportr_select(adds_spec) %>% 
  xportr_order(adds_spec) %>% 
  xportr_label(adds_spec) %>% 
  xportr_format(adds_spec)
sophie-gem commented 1 year ago

I don't know if this is something you still want implementing? But I'd be happy to give this one a go.

bms63 commented 1 year ago

@atorus-research/xportr-development-team what do folks think?

It would probably be similar to metatools::drop_unspec_vars, but I think could be a handy to have in xportr as users might not be using metatools or metacore packages.

cpiraux commented 1 year ago

I think it would be helpful to have a function that selects variables.

I like the message from metatools::drop_unspec_vars that lists the dropped variables, and I suggest having a similar message.

I am also interested in having a message that lists the variables in metadata that are not in the dataframe to check if we forgot to derive some variables. This message might be optional, in case the input metadata is a metadata repository instead of specification, the number of variables in MDR is greater than the one in dataframe.

bms63 commented 1 year ago

It looks like @sophie-gem is going to take this on! Thanks!!

sophie-gem commented 1 year ago

Hi, apologies if I have misunderstood (or if this has already been addressed in a feature branch) but was wondering - should the description of domain be 'A character value to subset the metacore. ...' as opposed to 'A character value to subset the .df. ...'? (Seen in xportr_format and xportr_length, for example.)

cpiraux commented 1 year ago

The Roxygen header will be updated to ensure consistency across all functions. Please refer to the Style Guide for Roxygen Headers in the wiki for more details. https://github.com/atorus-research/xportr/wiki/Style-Guide-for-Roxygen-Headers

bms63 commented 1 year ago

Hi @sophie-gem I was wondering if you had an questions or need any support on this issue? We have been meeting Tuesday and Thursday 11:30 US EST time to discuss xportr. Would you like to come?

sophie-gem commented 1 year ago

Hi @bms63, yes please - that would be great. It'll be useful to hear where you are heading with this so I can make the function consistent.

bms63 commented 1 year ago

Hi, apologies if I have misunderstood (or if this has already been addressed in a feature branch) but was wondering - should the description of domain be 'A character value to subset the metacore. ...' as opposed to 'A character value to subset the .df. ...'? (Seen in xportr_format and xportr_length, for example.)

ACtually should be 'A character value to subset the metadata object as we can use either a spec file or a metacore object.

sophie-gem commented 1 year ago

What are your thoughts, in terms of listing the variables dropped?

image

or

image

or if anyone can think of something else, I'm up for suggestions!

sophie-gem commented 1 year ago

Also, are you wanting the tests for xportr_select to be included in the same PR as the function, or separated out into a different issue?

cpiraux commented 1 year ago

What are your thoughts, in terms of listing the variables dropped?

image

or

image

or if anyone can think of something else, I'm up for suggestions!

I have a preference for the first option, I find it more readible.

Would it be also possible to list the variables present in metadata but not in data. I often forget some variables and it would be very helpful to have a list of remaining variables to derive

sophie-gem commented 1 year ago

I have a preference for the first option, I find it more readible.

Would it be also possible to list the variables present in metadata but not in data. I often forget some variables and it would be very helpful to have a list of remaining variables to derive

Regarding your second point, I have implemented it so that depending on verbose, depends on whether you get an error, warning, message, or none in terms of what variables are present in metadata but missing from .df. Is this what you were wanting?

cpiraux commented 1 year ago

Great! It's good to hear that the implementation is already done. I took a look, and it seems to be ok.

elimillera commented 11 months ago

Update documentation in xportr to point at metatools::drop_unspec_vars for this functionality.

sophie-gem commented 11 months ago

Update documentation in xportr to point at metatools::drop_unspec_vars for this functionality.

Has the plan changed with this then? The function has pretty much been completed and just needed tests writing...but am happy to delete all that if the plan is to use metatools going forward.

bms63 commented 11 months ago

Hi @sophie-gem yes the planned changed - apologies. There was some discussion on writing a function for keys and sorting, but metatools also has this function available and we decided to not duplicate in xportr. The same goes for this select function.

We just want to update the vignette now to have information and an example on using this function

sophie-gem commented 11 months ago

Hi @sophie-gem yes the planned changed - apologies. There was some discussion on writing a function for keys and sorting, but metatools also has this function available and we decided to not duplicate in xportr. The same goes for this select function.

We just want to update the vignette now to have information and an example on using this function

Awesome, no problem - I'll probably drop the current branch and start a new one. Unless anyone can think of a better way of approaching this?

bms63 commented 11 months ago

@sophie-gem that seems reasonable to me! fresh branch!

sophie-gem commented 10 months ago

What do you guys reckon? In the 'Getting Started Vignette' would it be worth:

(a) applying metatools::drop_unspec_vars() at the beginning [but then you wouldn't see any of the warnings from non-spec variables in the output]. (b) applying metatools::drop_unspec_vars() at the end [so that we maintain the warnings throughout the vignette]. (c) not showing this at all and just having a passing statement which says "Variables not in the spec can be removed from the final dataset using `metatools::drop_unspec_vars()".

Let me know your thoughts.

bms63 commented 10 months ago

Could we try out option b and see how it looks??

sophie-gem commented 8 months ago

I'm sorry, I don't think I'm going to be able to fit in the creation of an article explaining the potential use of metatools with xportr before the planned release date...

I anticipate being able to finish the formatting issue I'm also on - but in case anyone else has some capacity currently, I'll unassign myself from this issue (can always pick it back up at a later stage once the number of tasks I have decreases).

bms63 commented 8 months ago

All good @sophie-gem !! @atorus-research/xportr-development-team anyone looking for documentation updates!! Showing how the packages can potentially interact I think is a big win!!