globalbioticinteractions / usnm-ixodes

GloBI configuration to index Ixodes Records from US National Museum of Natural History
0 stars 0 forks source link

question about name cases (uppercase/ lower case) #3

Open jhpoelen opened 3 years ago

jhpoelen commented 3 years ago

It appears that a mixed use of name casing is used.

While many names are all upper case:

$ elton names | cut -f2 | grep -E "^[A-Z]{2,}" | sort | uniq -c | sort -nr | head
listing taxa [local]... done.
   3330 IXODES SP.
    950 IXODES PACIFICUS
    933 IXODES SCAPULARIS
    810 IXODES RICINUS
    776 IXODES GRANULATUS
    731 IXODES ANGUSTUS
    664 IXODES SCULPTUS
    611 DRAG
    557 IXODES OVATUS
    490 HOMO SAPIENS

other names are not:

$ elton names | cut -f2 | grep -E "^[A-Z]{0,1}[a-z]" | sort | uniq -c | sort -nr | head
listing taxa [local]... done.
     46 Plestiodon laticeps
     44 Peromyscus gossypinus
     30 Ixodes scapularis
     20 Ixodes ALLUAUDI
     19 Praomys sp.
     17 Ixodes sp.
     10 Plestiodon inexpectatus
      8 Plestiodon fasciatus
      8 Ixodes SCAPULARIS
      7 Plestiodon LATICEPS

Many taxonomic name resolvers are cases sensitive and expect common casing (e.g., Homo sapiens instead of HOMO SAPIENS or homo sapiens).

How to deal with this variety of name spellings in the USNM Ixodes dataset?

jhpoelen commented 3 years ago

According to Jessica Bird / David Pecor - the capitalization is unusual and is likely to change back to the regular spelling.

jhpoelen commented 3 years ago

Either Nick Dowdy or Vijay working on the dataset (?)

njdowdy commented 3 years ago

Either Nick Dowdy or Vijay working on the dataset (?)

Yes, we have this mixed case issue mostly resolved.

jhpoelen commented 3 years ago

@njdowdy great to hear that you are working on resolving the mixed case names. Is it still correct to assume that you've extracted only the names from the wealth of information in the Ixodes dataset and will curate them elsewhere?

njdowdy commented 3 years ago

@njdowdy great to hear that you are working on resolving the mixed case names. Is it still correct to assume that you've extracted only the names from the wealth of information in the Ixodes dataset and will curate them elsewhere?

Yes, that is the case. Our plan is to feed new names / changes back to aggregators like GlobalNames and GBIF. We are hoping to have a web-based tool for maintaining changes via domain experts on each major taxon covered by TPT.