Open avallecam opened 6 days ago
In direct response to the issue title: There is no tags_df()
because the naming of tags has been dropped throughout the package (pending the rename of the package).
All functionality that remains is indeed labels_df()
, and good to hear the feedback around how it does or does not work for you 😊 We will not be reintroducing the tags_df()
as the naming does not fit, but I am happy to consider your second suggested change for integration ("get only the labelled columns"). It may make sense to only have the labelled and validated ones in there. In order to make that comparison, could you add a direct comparison between linelist and datatagr, for the same data?
Your third proposed change ("get standardised column names"), I am not sure about. The package scope is not to wrangle variable names into a prettier format. In your example, the renaming of speed
into miles_per_hour
does not necessarily make the output of labels_df
more usable, if we also retain the labels. It may make sense if we drop the label attribute when using labels_df
, and put the label information in the variable name (snake_case
formatted), but not both. Would you be okay with dropping the labels and interoperability with labelled in that scenario?
Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
linelist::tags_df()
is not comparable todatatagr::labels_df()
linelist::tags_df()
generates an output that secures downstream analysis in the outbreak analytics pipelinedatatagr::labels_df()
generates an output that helps me to showcase the dataset with labelsCan we have a function in
datatagr
that still inherits the power of tagging columns to get a validated set of them for secure downstream analysis? Isdatatagr::make_datatagr()
in the capacity to create a tagged dataframe? If this has been discussed elsewhere, I am happy to read it.In the reprex below I compare package features.
Created on 2024-10-08 with reprex v2.1.1
Describe the solution you'd like A clear and concise description of what you want to happen.
datatagr::tags_df()
function to get tagged-only and validated-only columns for downstream analysisdatatagr::labels_df()
to get only the labelled columns (motivating downstream analysis restricted to labelled and validated columns only)datatagr::labels_df()
to get standardised column names (to avoid usingcleanepi
downstream) with labels interoperable with {labelled} (possibly)Additional context Add any other context or screenshots about the feature request here.