IntegralEnvision / integral

Package for Integral functions
https://integralenvision.github.io/integral/
Other
0 stars 0 forks source link

create duplicate identification code #85

Open atyrell3 opened 1 year ago

atyrell3 commented 1 year ago

One common water quality data check is to test whether there are possible duplicate samples in the dataset. If there are duplicate samples, they should be identified and marked as A, B, C, etc. A generic function to apply this check and update the data with the duplicate information if needed would help streamline water quality data processing. I can develop this functionality if people agree that it is a priority.

jzadra commented 1 year ago

@mparslow-integral is this data quality check something that will be in the integrity tool we are developing as part of TWG?

jzadra commented 1 year ago

@atyrell3 What is the reason for the duplicates in this case? Are the duplicates true duplicates (data errors) or are they from repeated samples?

mparslow-integral commented 1 year ago

@jzadra @atyrell3 We would check for it in the integrity checker but not just blindly add the a, b, c unless we knew they were true duplicates. Could be configured by user to add the a, b, c though

kheal commented 1 year ago

Regardless if it in the integrity checker, this would be super helpful as an R function - this is something that several of us do often.

jzadra commented 1 year ago

Sounds good, feel free to do a PR @atyrell3 !

jzadra commented 1 year ago

There is a base R function, make.unique(), that does this (except it uses numerals instead of letters).