Closed Melissa37 closed 5 years ago
First draft conversion of the archive is ready to roll. The strategy is to convert the value to lower case (and remove any italic tags), and then look for a match from the following list. If there's no match, then the original XML is used. This is used to replace the existing tag.
Python match dict below:
match_list['arabidopsis'] = '<kwd><italic>Arabidopsis</italic></kwd>'
match_list['b. subtilis'] = '<kwd><italic>B. subtilis</italic></kwd>'
match_list['c. elegans'] = '<kwd><italic>C. elegans</italic></kwd>'
match_list['c. intestinalis'] = '<kwd><italic>C. intestinalis</italic></kwd>'
match_list['chicken'] = '<kwd>Chicken</kwd>'
match_list['ciona intestinalis'] = '<kwd><italic>C. intestinalis</italic></kwd>'
match_list['d. melanogaster'] = '<kwd><italic>D. melanogaster</italic></kwd>'
match_list['dictyostelium'] = '<kwd><italic>Dictyostelium</italic></kwd>'
match_list['drosophila melanogaster'] = '<kwd><italic>D. melanogaster</italic></kwd>'
match_list['e. coli'] = '<kwd><italic>E. coli</italic></kwd>'
match_list['frog'] = '<kwd>Frog</kwd>'
match_list['fruit fly'] = '<kwd><italic>D. melanogaster</italic></kwd>'
match_list['human'] = '<kwd>Human</kwd>'
match_list['macaca mulatta'] = '<kwd><italic>M. mulatta</italic></kwd>'
match_list['maize'] = '<kwd>Maize</kwd>'
match_list['mouse'] = '<kwd>Mouse</kwd>'
match_list['myceliophthora thermophila'] = '<kwd><italic>M. thermophila</italic></kwd>'
match_list['n. crassa'] = '<kwd><italic>N. crassa</italic></kwd>'
match_list['neurospora'] = '<kwd><italic>Neurospora</italic></kwd>'
match_list['none'] = '<kwd>None</kwd>'
match_list['oncopeltus fasciatus'] = '<kwd><italic>O. fasciatus</italic></kwd>'
match_list['other'] = '<kwd>Other</kwd>'
match_list['plasmodium falciparum'] = '<kwd><italic>P. falciparum</italic></kwd>'
match_list['platynereis dumerilii'] = '<kwd><italic>P. dumerilii</italic></kwd>'
match_list['rat'] = '<kwd>Rat</kwd>'
match_list['s. cerevisiae'] = '<kwd><italic>S. cerevisiae</italic></kwd>'
match_list['s. pombe'] = '<kwd><italic>S. pombe</italic></kwd>'
match_list['salmonella enterica serovar typhi'] = '<kwd><italic>S. enterica serovar</italic> Typhi</kwd>'
match_list['streptococcus pyogenes'] = '<kwd><italic>S. pyogenes</italic></kwd>'
match_list['viruses'] = '<kwd>Virus</kwd>'
match_list['volvox'] = '<kwd><italic>Volvox</italic></kwd>'
match_list['xenopus'] = '<kwd><italic>Xenopus</italic></kwd>'
match_list['yellow baboon (papio cynocephalus)'] = '<kwd><italic>P. cynocephalus</italic></kwd>'
match_list['zebrafish'] = '<kwd>Zebrafish</kwd>'
Updated list - added 'bat', changed arabidopsis, macaca mulatta, salmonella enterica serovar typhi
match_list['arabidopsis'] = '<kwd><italic>A. thaliana</italic></kwd>'
match_list['bat'] = '<kwd>Bat</kwd>'
match_list['b. subtilis'] = '<kwd><italic>B. subtilis</italic></kwd>'
match_list['c. elegans'] = '<kwd><italic>C. elegans</italic></kwd>'
match_list['c. intestinalis'] = '<kwd><italic>C. intestinalis</italic></kwd>'
match_list['chicken'] = '<kwd>Chicken</kwd>'
match_list['ciona intestinalis'] = '<kwd><italic>C. intestinalis</italic></kwd>'
match_list['d. melanogaster'] = '<kwd><italic>D. melanogaster</italic></kwd>'
match_list['dictyostelium'] = '<kwd><italic>Dictyostelium</italic></kwd>'
match_list['drosophila melanogaster'] = '<kwd><italic>D. melanogaster</italic></kwd>'
match_list['e. coli'] = '<kwd><italic>E. coli</italic></kwd>'
match_list['frog'] = '<kwd>Frog</kwd>'
match_list['fruit fly'] = '<kwd><italic>D. melanogaster</italic></kwd>'
match_list['human'] = '<kwd>Human</kwd>'
match_list['macaca mulatta'] = '<kwd>Rhesus macaque</kwd>'
match_list['maize'] = '<kwd>Maize</kwd>'
match_list['mouse'] = '<kwd>Mouse</kwd>'
match_list['myceliophthora thermophila'] = '<kwd><italic>M. thermophila</italic></kwd>'
match_list['n. crassa'] = '<kwd><italic>N. crassa</italic></kwd>'
match_list['neurospora'] = '<kwd><italic>Neurospora</italic></kwd>'
match_list['none'] = '<kwd>None</kwd>'
match_list['oncopeltus fasciatus'] = '<kwd><italic>O. fasciatus</italic></kwd>'
match_list['other'] = '<kwd>Other</kwd>'
match_list['plasmodium falciparum'] = '<kwd><italic>P. falciparum</italic></kwd>'
match_list['platynereis dumerilii'] = '<kwd><italic>P. dumerilii</italic></kwd>'
match_list['rat'] = '<kwd>Rat</kwd>'
match_list['s. cerevisiae'] = '<kwd><italic>S. cerevisiae</italic></kwd>'
match_list['s. pombe'] = '<kwd><italic>S. pombe</italic></kwd>'
match_list['salmonella enterica serovar typhi'] = '<kwd><italic>S. enterica</italic> serovar Typhi</kwd>'
match_list['streptococcus pyogenes'] = '<kwd><italic>S. pyogenes</italic></kwd>'
match_list['viruses'] = '<kwd>Virus</kwd>'
match_list['volvox'] = '<kwd><italic>Volvox</italic></kwd>'
match_list['xenopus'] = '<kwd><italic>Xenopus</italic></kwd>'
match_list['yellow baboon (papio cynocephalus)'] = '<kwd><italic>P. cynocephalus</italic></kwd>'
match_list['zebrafish'] = '<kwd>Zebrafish</kwd>'
Ok to close this @Melissa37 ? I'm just going through looking at issues assigned to me and not closed.
Yup, thanks
eg differences: e.coli E.coli E. coli e. coli E. Coli
<italics>E. coli</italics>
etc
There are others too. I'll ask Sian/Nathan for the database info so we in editorial and production can start working out the extent of the problem!