This isn't a big issue for others given we have squirrel_ids in all the LTD on the cloud. However, in case anyone needs to match records by something other than squirrel_id I wanted to share some code to help make the colour records consistent/error free.
(I am working with the old old old data and matching it with our behaviour LTD (in the cloud), but the old data doesn't include squirrel_ids, so I am using a combination of tag colours and numbers to match individuals.)
mutate(
color_left=ifelse(color_left %in% "", "NA", color_left),
color_right=ifelse(color_right %in% "", "NA", color_right),
str_replace_all(color_left,
c(" "="", #remove all spaces
"BK"="Bk", "PK"="Pk", #consistency in upper/lower case was important for matching records - I made the k for black and pink lower case, while I kept all the "first" letters of colours uppercase (like green yellow below)
"Gy"="GY",
"--"="-", "rip"="-", "RIP"="-", "RI"="-", "Ri"="-",
"r"="R", "y"="Y", "b"="B", "w"="W", "p"="P",
"0"="O", "0!"="O!", "P0"="PO",
"UCS"="UTS", "uts"="UTS", "Uts"="UTS", "UTS J"="UTS", "UTS j"="UTS", "UTJ"="UTS", "UTM"="UTS", "UTS!"="UTS", "UTS?"="UTS",
"MALE"="NA", "COLOU"="NA", "DARK"="NA", "NONE"="NA", "TAG"="NA", "TS"="NA", "UNCLR"="NA", "Unk"="NA", "UNKN"="NA", "UNK"="NA",
"H.4."="NA", "T6"="NA", "J.2."="NA", "I.7."="NA",
"?"="NA", "?!"="NA", "?*"="NA")), #I used filter() to remove these, but you might need it to be fixed('?')="NA" because str_replace_all() doesn't like symbols very much
str_replace_all(color_right,
c(" "="", #remove all spaces
"BK"="Bk", "PK"="Pk", #consistency in upper/lower case was important for matching records - I made the k for black and pink lower case, while I kept all the "first" letters of colours uppercase (like green yellow below)
"Gy"="GY",
"--"="-", "rip"="-", "RIP"="-", "RI"="-", "Ri"="-",
"r"="R", "y"="Y", "b"="B", "w"="W",
"0"="O", "0!"="O!", "P0"="PO",
"UCS"="UTS", "uts"="UTS", "Uts"="UTS", "UTS J"="UTS", "UTS j"="UTS", "UTJ"="UTS", "UTM"="UTS", "UTS!"="UTS", "UTS?"="UTS",
"MALE"="NA", "COLOU"="NA", "DARK"="NA", "NONE"="NA", "TAG"="NA", "TS"="NA", "UNCLR"="NA", "Unk"="NA", "UNKN"="NA", "UNK"="NA",
"H.4."="NA", "T6"="NA", "J.2."="NA", "I.7."="NA",
"?"="NA", "?!"="NA", "?*"="NA"))) #I used filter() to remove these, but you might need it to be fixed('?')="NA" because str_replace_all() doesn't like symbols very much
I'm likely missing some anyway, but this is the code I used to fix up some of the colour variations that seem to pop up here and there in the behaviour LTD.
Also, I just excluded all of these - but they'd be worth taking a closer look at:
color_left:
record id 24508 - color_left is -0
182829: B?
69: Bk!?
120778: BK!(?
122696, 123285, 124523: GRBK (this might be both left and right together?)
186484: RBI #is I a colour?
8919, 8920, 8924, 8925, 8949: S* #is S a colour?
124175: H
color_right:
18169: -Y
121951, 121952: (P
117080, 117081: B?
5538: D*
89696, 93782: GL
183123: I!
8646: O-
183977: O?
323616, 341927-36, 347638, 347639, 353888-90: OI!
117367: OL
19769: P-
95222: PL
122182: R!?
118413, 124247: R?
88614: RL
S #is S a colour? - I didn't write the IDs down for this one because there are a solid 61 records with it
S #is S a colour? - same thing here, there are 436 records with S
122434: W (
122327: W?
94596: Y!?
120812: Y(?)
89078, 121665, 121666: YL
Other changes:
Record 134028 should be O/R (not O/R!)
squirrel_id== 5097 should be B/- (not B/)
squirrel_id== 4314 and record id==21720 should be B/R (not B/OR)
squirrel_id== 5579 has juvenile colours two years in a row?
squirrel_id== 4394 should be B/R (not B/RW)
squirrel_id== 19501 should be Y/R (not Y/D)
squirrel_id== 2921 should be left=G and right=B (not left=GB)
squirrelid== 5579 record id==32096 is either -/right or the right colour should be split between left and right (G/Y - main issue is that left shouldn't be blank
squirrel_id== 10136 in 2005 should be Y/O
squirrel_id== 10183 in 2005 should be R/O*
This isn't a big issue for others given we have squirrel_ids in all the LTD on the cloud. However, in case anyone needs to match records by something other than squirrel_id I wanted to share some code to help make the colour records consistent/error free.
(I am working with the old old old data and matching it with our behaviour LTD (in the cloud), but the old data doesn't include squirrel_ids, so I am using a combination of tag colours and numbers to match individuals.)
I'm likely missing some anyway, but this is the code I used to fix up some of the colour variations that seem to pop up here and there in the behaviour LTD.
Also, I just excluded all of these - but they'd be worth taking a closer look at:
color_left:
color_right:
Other changes: Record 134028 should be O/R (not O/R!) squirrel_id== 5097 should be B/- (not B/) squirrel_id== 4314 and record id==21720 should be B/R (not B/OR) squirrel_id== 5579 has juvenile colours two years in a row? squirrel_id== 4394 should be B/R (not B/RW) squirrel_id== 19501 should be Y/R (not Y/D) squirrel_id== 2921 should be left=G and right=B (not left=GB) squirrelid== 5579 record id==32096 is either -/right or the right colour should be split between left and right (G/Y - main issue is that left shouldn't be blank squirrel_id== 10136 in 2005 should be Y/O squirrel_id== 10183 in 2005 should be R/O*