Isolate identifiers need to remove
Serovar: needed only serotype information, remove any other information: "E. coli", ":", "Undetermined", "double blank spaces" and replace pending/ongoing identification for "ongoing" as example: "Isolate to CDC:Isolate to CDC" to "ongoing"
Create date need to be rename to "date", need to remove time information, example: "2022-05-10T13:37:17Z" to "2022-05-10".
Location: need to remove remane stated for abbreviation (example: California to CA" and split in 3 columns: Country, State and city.
Isolation source need to be rename to "Isolation", and homogenized, A new column "Source" need to be done with classification: Clinical, Food, Animal, Environmental based on isolation column, example: "coyote feces" is classified into "animal"
Clean up the following columns:
Strain needs to be removed
Isolate identifiers need to remove Serovar: needed only serotype information, remove any other information: "E. coli", ":", "Undetermined", "double blank spaces" and replace pending/ongoing identification for "ongoing" as example: "Isolate to CDC:Isolate to CDC" to "ongoing" Create date need to be rename to "date", need to remove time information, example: "2022-05-10T13:37:17Z" to "2022-05-10". Location: need to remove remane stated for abbreviation (example: California to CA" and split in 3 columns: Country, State and city. Isolation source need to be rename to "Isolation", and homogenized, A new column "Source" need to be done with classification: Clinical, Food, Animal, Environmental based on isolation column, example: "coyote feces" is classified into "animal"