Open atyrell3 opened 1 year ago
@mparslow-integral is this data quality check something that will be in the integrity tool we are developing as part of TWG?
@atyrell3 What is the reason for the duplicates in this case? Are the duplicates true duplicates (data errors) or are they from repeated samples?
@jzadra @atyrell3 We would check for it in the integrity checker but not just blindly add the a, b, c unless we knew they were true duplicates. Could be configured by user to add the a, b, c though
Regardless if it in the integrity checker, this would be super helpful as an R function - this is something that several of us do often.
Sounds good, feel free to do a PR @atyrell3 !
There is a base R function, make.unique()
, that does this (except it uses numerals instead of letters).
One common water quality data check is to test whether there are possible duplicate samples in the dataset. If there are duplicate samples, they should be identified and marked as A, B, C, etc. A generic function to apply this check and update the data with the duplicate information if needed would help streamline water quality data processing. I can develop this functionality if people agree that it is a priority.