KluaneRedSquirrelProject / KRSP-Database-Errors

0 stars 2 forks source link

Tag colour variations #97

Open martinig opened 2 years ago

martinig commented 2 years ago

This isn't a big issue for others given we have squirrel_ids in all the LTD on the cloud. However, in case anyone needs to match records by something other than squirrel_id I wanted to share some code to help make the colour records consistent/error free.

(I am working with the old old old data and matching it with our behaviour LTD (in the cloud), but the old data doesn't include squirrel_ids, so I am using a combination of tag colours and numbers to match individuals.)


mutate(
color_left=ifelse(color_left %in% "", "NA", color_left),
color_right=ifelse(color_right %in% "", "NA", color_right),
str_replace_all(color_left, 
       c(" "="", #remove all spaces
          "BK"="Bk", "PK"="Pk", #consistency in upper/lower case was important for matching records - I made the k for black and pink lower case, while I kept all the "first" letters of colours uppercase (like green yellow below)
          "Gy"="GY", 
          "--"="-", "rip"="-", "RIP"="-", "RI"="-", "Ri"="-", 
          "r"="R", "y"="Y", "b"="B", "w"="W", "p"="P",
          "0"="O", "0!"="O!", "P0"="PO",
          "UCS"="UTS", "uts"="UTS", "Uts"="UTS", "UTS J"="UTS", "UTS j"="UTS", "UTJ"="UTS", "UTM"="UTS", "UTS!"="UTS", "UTS?"="UTS",
          "MALE"="NA", "COLOU"="NA", "DARK"="NA", "NONE"="NA", "TAG"="NA", "TS"="NA", "UNCLR"="NA", "Unk"="NA", "UNKN"="NA", "UNK"="NA",
          "H.4."="NA", "T6"="NA", "J.2."="NA", "I.7."="NA", 
          "?"="NA", "?!"="NA", "?*"="NA")), #I used filter() to remove these, but you might need it to be fixed('?')="NA" because str_replace_all() doesn't like symbols very much
str_replace_all(color_right, 
       c(" "="", #remove all spaces
          "BK"="Bk", "PK"="Pk", #consistency in upper/lower case was important for matching records - I made the k for black and pink lower case, while I kept all the "first" letters of colours uppercase (like green yellow below)
          "Gy"="GY", 
          "--"="-", "rip"="-", "RIP"="-", "RI"="-", "Ri"="-", 
          "r"="R", "y"="Y", "b"="B", "w"="W",
          "0"="O", "0!"="O!", "P0"="PO",
          "UCS"="UTS", "uts"="UTS", "Uts"="UTS", "UTS J"="UTS", "UTS j"="UTS", "UTJ"="UTS", "UTM"="UTS", "UTS!"="UTS", "UTS?"="UTS",
          "MALE"="NA", "COLOU"="NA", "DARK"="NA", "NONE"="NA", "TAG"="NA", "TS"="NA", "UNCLR"="NA", "Unk"="NA", "UNKN"="NA", "UNK"="NA",
          "H.4."="NA", "T6"="NA", "J.2."="NA", "I.7."="NA", 
          "?"="NA", "?!"="NA", "?*"="NA")))  #I used filter() to remove these, but you might need it to be fixed('?')="NA" because str_replace_all() doesn't like symbols very much

I'm likely missing some anyway, but this is the code I used to fix up some of the colour variations that seem to pop up here and there in the behaviour LTD.

Also, I just excluded all of these - but they'd be worth taking a closer look at:

color_left:

  1. record id 24508 - color_left is -0
  2. 182829: B?
  3. 69: Bk!?
  4. 120778: BK!(?
  5. 122696, 123285, 124523: GRBK (this might be both left and right together?)
  6. 186484: RBI #is I a colour?
  7. 8919, 8920, 8924, 8925, 8949: S* #is S a colour?
  8. 124175: H

color_right:

  1. 18169: -Y
  2. 121951, 121952: (P
  3. 117080, 117081: B?
  4. 5538: D*
  5. 89696, 93782: GL
  6. 183123: I!
  7. 8646: O-
  8. 183977: O?
  9. 323616, 341927-36, 347638, 347639, 353888-90: OI!
  10. 117367: OL
  11. 19769: P-
  12. 95222: PL
  13. 122182: R!?
  14. 118413, 124247: R?
  15. 88614: RL
  16. S #is S a colour? - I didn't write the IDs down for this one because there are a solid 61 records with it
  17. S #is S a colour? - same thing here, there are 436 records with S
  18. 122434: W (
  19. 122327: W?
  20. 94596: Y!?
  21. 120812: Y(?)
  22. 89078, 121665, 121666: YL

Other changes: Record 134028 should be O/R (not O/R!) squirrel_id== 5097 should be B/- (not B/) squirrel_id== 4314 and record id==21720 should be B/R (not B/OR) squirrel_id== 5579 has juvenile colours two years in a row? squirrel_id== 4394 should be B/R (not B/RW) squirrel_id== 19501 should be Y/R (not Y/D) squirrel_id== 2921 should be left=G and right=B (not left=GB) squirrelid== 5579 record id==32096 is either -/right or the right colour should be split between left and right (G/Y - main issue is that left shouldn't be blank squirrel_id== 10136 in 2005 should be Y/O squirrel_id== 10183 in 2005 should be R/O*