IDR / idr-utils

Utility scripts for managing IDR submissions
BSD 2-Clause "Simplified" License
2 stars 6 forks source link

Yeast genes cleanup #22

Closed sbesson closed 4 years ago

sbesson commented 4 years ago

Originally coming from https://github.com/IDR/idr0040-aymoz-singlecell/pull/11, this turned into a wider clean up of all yeast genome map in the IDR. Opening this as a utility script against idr-utils in case we need to update in the future

This script:

sbesson commented 4 years ago

Applied against pilot-idr0072 so it should be possible to use this pilot resource for checking the URLs have been updated and the duplicate URLs are removed

[sbesson@pilot-idr0072-omeroreadwrite ~]$ /opt/omero/server/venv3/bin/omero login
Previously logged in to localhost:4064 as public
Server: [localhost:4064]
Username: [public]demo
Password:
Created session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
[sbesson@pilot-idr0072-omeroreadwrite ~]$ /opt/omero/server/venv3/bin/python update_yeast_genes.py --dry-run
INFO:omero.util.Resources:Starting
INFO:omero.util.Resources:Starting
INFO:omero.util.Resources:Halted
INFO:root:Found 7992 yeast genes
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YMR232W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YIL015W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YLR452C
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YCL027W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YJL157C
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YNL279W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YIL117C
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YCL055W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YBR040W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YNR044W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YCR089W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YPL192C
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YPR141C
INFO:root:Updating annotations for 7992 genes
INFO:omero.util.Resources:Halted
[sbesson@pilot-idr0072-omeroreadwrite ~]$ /opt/omero/server/venv3/bin/python update_yeast_genes.py
INFO:omero.util.Resources:Starting
INFO:omero.util.Resources:Starting
INFO:omero.util.Resources:Halted
INFO:root:Found 7992 yeast genes
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YMR232W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YIL015W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YLR452C
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YCL027W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YJL157C
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YNL279W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YIL117C
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YCL055W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YBR040W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YNR044W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YCR089W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YPL192C
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YPR141C
INFO:root:Updating annotations for 7992 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 492 genes
INFO:omero.util.Resources:Halted
[sbesson@pilot-idr0072-omeroreadwrite ~]$ /opt/omero/server/venv3/bin/python update_yeast_genes.py --dry-run
INFO:omero.util.Resources:Starting
INFO:omero.util.Resources:Starting
INFO:omero.util.Resources:Halted
INFO:root:Found 7992 yeast genes
INFO:root:Updating annotations for 0 genes
INFO:omero.util.Resources:Halted
[sbesson@pilot-idr0072-omeroreadwrite ~]$ /opt/omero/server/venv3/bin/python update_yeast_genes.py --dry-run
INFO:omero.util.Resources:Starting
INFO:omero.util.Resources:Halted
INFO:omero.util.Resources:Starting
INFO:root:Found 7992 yeast genes
INFO:root:Updating annotations for 0 genes
INFO:omero.util.Resources:Halted

The script was designed to be re-executable and idempotent so it should also be possible to test the commands above before applying on idr-next.

joshmoore commented 4 years ago

In general, code looks good. :+1: I could see this being the first step towards a background process which notifies sysadmins that registered ontologies have "issues" which need addressing.

sbesson commented 4 years ago

Now deployed on prod88, we seem to have introduced a few new 100 yeast genes as part of idr0078 which have been updated

[sbesson@prod88-omeroreadwrite idr-utils]$ /opt/omero/server/venv3/bin/python scripts/annotate/update_yeast_genes.py --dry-run
INFO:omero.util.Resources:Starting
INFO:omero.util.Resources:Halted
INFO:omero.util.Resources:Starting
INFO:root:Found 8237 yeast genes
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YMR232W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YIL015W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YLR452C
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YCL027W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YJL157C
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YNL279W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YIL117C
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YCL055W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YBR040W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YNR044W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YCR089W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YPL192C
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YPR141C
INFO:root:Updating annotations for 8237 genes
INFO:omero.util.Resources:Halted
[sbesson@prod88-omeroreadwrite idr-utils]$ /opt/omero/server/venv3/bin/python scripts/annotate/update_yeast_genes.py 
INFO:omero.util.Resources:Starting
INFO:omero.util.Resources:Starting
INFO:omero.util.Resources:Halted
INFO:root:Found 8237 yeast genes
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YMR232W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YIL015W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YLR452C
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YCL027W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YJL157C
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YNL279W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YIL117C
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YCL055W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YBR040W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YNR044W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YCR089W
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YPL192C
INFO:root:Removing duplicate Gene URL key: http://www.yeastgenome.org/locus/YPR141C
INFO:root:Updating annotations for 8237 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 500 genes
INFO:root:Updating batch of 237 genes
INFO:omero.util.Resources:Halted
[sbesson@prod88-omeroreadwrite idr-utils]$ /opt/omero/server/venv3/bin/python scripts/annotate/update_yeast_genes.py --dry-run
INFO:omero.util.Resources:Starting
INFO:omero.util.Resources:Halted
INFO:omero.util.Resources:Starting
INFO:root:Found 8237 yeast genes
INFO:root:Updating annotations for 0 genes
INFO:omero.util.Resources:Halted

Screenshot 2020-09-17 at 11 48 06

I'll update all bulkmap config files to use the new yeastgenome URL so that we don't reintroduce older versions in the future.