elifesciences / elife-vendor-workflow-config

capturing requriemtns and niggles for setting up the elife production workflow
0 stars 1 forks source link

Clean up of archive: keywords #82

Closed Melissa37 closed 5 years ago

Melissa37 commented 9 years ago

eg differences: e.coli E.coli E. coli e. coli E. Coli <italics>E. coli</italics>

etc

There are others too. I'll ask Sian/Nathan for the database info so we in editorial and production can start working out the extent of the problem!

Melissa37 commented 9 years ago

https://docs.google.com/spreadsheets/d/1LlVtROqJZ2s_sgtCMDco725aeGf5LXIQrtHflVptpbE/edit?pli=1#gid=2066109275

gnott commented 9 years ago

First draft conversion of the archive is ready to roll. The strategy is to convert the value to lower case (and remove any italic tags), and then look for a match from the following list. If there's no match, then the original XML is used. This is used to replace the existing tag.

Python match dict below:

        match_list['arabidopsis'] = '<kwd><italic>Arabidopsis</italic></kwd>'
        match_list['b. subtilis'] = '<kwd><italic>B. subtilis</italic></kwd>'
        match_list['c. elegans'] = '<kwd><italic>C. elegans</italic></kwd>'
        match_list['c. intestinalis'] = '<kwd><italic>C. intestinalis</italic></kwd>'
        match_list['chicken'] = '<kwd>Chicken</kwd>'
        match_list['ciona intestinalis'] = '<kwd><italic>C. intestinalis</italic></kwd>'
        match_list['d. melanogaster'] = '<kwd><italic>D. melanogaster</italic></kwd>'
        match_list['dictyostelium'] = '<kwd><italic>Dictyostelium</italic></kwd>'
        match_list['drosophila melanogaster'] = '<kwd><italic>D. melanogaster</italic></kwd>'
        match_list['e. coli'] = '<kwd><italic>E. coli</italic></kwd>'
        match_list['frog'] = '<kwd>Frog</kwd>'
        match_list['fruit fly'] = '<kwd><italic>D. melanogaster</italic></kwd>'
        match_list['human'] = '<kwd>Human</kwd>'
        match_list['macaca mulatta'] = '<kwd><italic>M. mulatta</italic></kwd>'
        match_list['maize'] = '<kwd>Maize</kwd>'
        match_list['mouse'] = '<kwd>Mouse</kwd>'
        match_list['myceliophthora thermophila'] = '<kwd><italic>M. thermophila</italic></kwd>'
        match_list['n. crassa'] = '<kwd><italic>N. crassa</italic></kwd>'
        match_list['neurospora'] = '<kwd><italic>Neurospora</italic></kwd>'
        match_list['none'] = '<kwd>None</kwd>'
        match_list['oncopeltus fasciatus'] = '<kwd><italic>O. fasciatus</italic></kwd>'
        match_list['other'] = '<kwd>Other</kwd>'
        match_list['plasmodium falciparum'] = '<kwd><italic>P. falciparum</italic></kwd>'
        match_list['platynereis dumerilii'] = '<kwd><italic>P. dumerilii</italic></kwd>'
        match_list['rat'] = '<kwd>Rat</kwd>'
        match_list['s. cerevisiae'] = '<kwd><italic>S. cerevisiae</italic></kwd>'
        match_list['s. pombe'] = '<kwd><italic>S. pombe</italic></kwd>'
        match_list['salmonella enterica serovar typhi'] = '<kwd><italic>S. enterica serovar</italic> Typhi</kwd>'
        match_list['streptococcus pyogenes'] = '<kwd><italic>S. pyogenes</italic></kwd>'
        match_list['viruses'] = '<kwd>Virus</kwd>'
        match_list['volvox'] = '<kwd><italic>Volvox</italic></kwd>'
        match_list['xenopus'] = '<kwd><italic>Xenopus</italic></kwd>'
        match_list['yellow baboon (papio cynocephalus)'] = '<kwd><italic>P. cynocephalus</italic></kwd>'
        match_list['zebrafish'] = '<kwd>Zebrafish</kwd>'
gnott commented 9 years ago

Updated list - added 'bat', changed arabidopsis, macaca mulatta, salmonella enterica serovar typhi

        match_list['arabidopsis'] = '<kwd><italic>A. thaliana</italic></kwd>'
        match_list['bat'] = '<kwd>Bat</kwd>'
        match_list['b. subtilis'] = '<kwd><italic>B. subtilis</italic></kwd>'
        match_list['c. elegans'] = '<kwd><italic>C. elegans</italic></kwd>'
        match_list['c. intestinalis'] = '<kwd><italic>C. intestinalis</italic></kwd>'
        match_list['chicken'] = '<kwd>Chicken</kwd>'
        match_list['ciona intestinalis'] = '<kwd><italic>C. intestinalis</italic></kwd>'
        match_list['d. melanogaster'] = '<kwd><italic>D. melanogaster</italic></kwd>'
        match_list['dictyostelium'] = '<kwd><italic>Dictyostelium</italic></kwd>'
        match_list['drosophila melanogaster'] = '<kwd><italic>D. melanogaster</italic></kwd>'
        match_list['e. coli'] = '<kwd><italic>E. coli</italic></kwd>'
        match_list['frog'] = '<kwd>Frog</kwd>'
        match_list['fruit fly'] = '<kwd><italic>D. melanogaster</italic></kwd>'
        match_list['human'] = '<kwd>Human</kwd>'
        match_list['macaca mulatta'] = '<kwd>Rhesus macaque</kwd>'
        match_list['maize'] = '<kwd>Maize</kwd>'
        match_list['mouse'] = '<kwd>Mouse</kwd>'
        match_list['myceliophthora thermophila'] = '<kwd><italic>M. thermophila</italic></kwd>'
        match_list['n. crassa'] = '<kwd><italic>N. crassa</italic></kwd>'
        match_list['neurospora'] = '<kwd><italic>Neurospora</italic></kwd>'
        match_list['none'] = '<kwd>None</kwd>'
        match_list['oncopeltus fasciatus'] = '<kwd><italic>O. fasciatus</italic></kwd>'
        match_list['other'] = '<kwd>Other</kwd>'
        match_list['plasmodium falciparum'] = '<kwd><italic>P. falciparum</italic></kwd>'
        match_list['platynereis dumerilii'] = '<kwd><italic>P. dumerilii</italic></kwd>'
        match_list['rat'] = '<kwd>Rat</kwd>'
        match_list['s. cerevisiae'] = '<kwd><italic>S. cerevisiae</italic></kwd>'
        match_list['s. pombe'] = '<kwd><italic>S. pombe</italic></kwd>'
        match_list['salmonella enterica serovar typhi'] = '<kwd><italic>S. enterica</italic> serovar Typhi</kwd>'
        match_list['streptococcus pyogenes'] = '<kwd><italic>S. pyogenes</italic></kwd>'
        match_list['viruses'] = '<kwd>Virus</kwd>'
        match_list['volvox'] = '<kwd><italic>Volvox</italic></kwd>'
        match_list['xenopus'] = '<kwd><italic>Xenopus</italic></kwd>'
        match_list['yellow baboon (papio cynocephalus)'] = '<kwd><italic>P. cynocephalus</italic></kwd>'
        match_list['zebrafish'] = '<kwd>Zebrafish</kwd>'
gnott commented 5 years ago

Ok to close this @Melissa37 ? I'm just going through looking at issues assigned to me and not closed.

Melissa37 commented 5 years ago

Yup, thanks