biothings / pending.api

Set of standalone APIs built with the BioThings SDK for the Translator Project
https://biothings.ncats.io
Apache License 2.0
5 stars 11 forks source link

Change bucket setting of snapshot repositories in NCATS server #61

Closed erikyao closed 2 years ago

erikyao commented 2 years ago

Currently the NCATS (ubuntu@biothings.ncats.io) ES is still using S3 bucket biothings-es6-snapshots for snapshots, as shown below:

pending@ip-172-31-1-254: ubuntu > curl -X GET "localhost:9200/_snapshot/pending*?pretty"
{
  "pending_repository" : {
    "type" : "s3",
    "settings" : {
      "bucket" : "biothings-es6-snapshots",
      "base_path" : "pending",
      "readonly" : "true",
      "region" : "us-west-2"
    }
  },
  "pending_umlschem_repository" : {
    "type" : "s3",
    "settings" : {
      "bucket" : "biothings-es6-snapshots",
      "base_path" : "pending/umlschem",
      "readonly" : "true",
      "region" : "us-west-2"
    }
  },
  ...
}

However our hub has been switched to biothings-es7-snapshots and cannot automatically change the bucket setting in the NCATS ES.

Better to have a bash script to manually do it.

erikyao commented 2 years ago

The following repo are deleted:

  # Data sources removed by https://github.com/biothings/pending.api/issues/34
  "pending_cord_disease_repository", 
  "pending_cord_cell_repository", 
  "pending_cord_chemical_repository", 
  "pending_cord_gene_repository", 
  "pending_cord_anatomy_repository", 
  "pending_cord_protein_repository", 
  "pending_cord_genomic_entity_repository", 
  "pending_cord_bp_repository", 
  "pending_cord_cc_repository", 
  "pending_cord_ma_repository", 

  "pending_covid19_repository", 

  "pending_diseases_test_repository",

  "pending_pathway_repository",

  # Data build "repoDB" renamed to "repodb"
  "pending_repoDB_repository",

  # Data source removed by https://github.com/biothings/pending.api/issues/35
  "pending_textminingkp_repository",

  # Data build "translator_clinical_risk_kp" renamed to "clinical_risk_kp"
  "pending_translator_clinical_risk_kp_repository"
erikyao commented 2 years ago

Repository settings updated with the following python script:

"""
Run on su06 with venv.

Tunnel from su06:9299 to biothings.ncats.io:9200 is needed. On su06, run:

    ssh -N -L 9299:localhost:9200 ubuntu@biothings.ncats.io -i <ssh_key>
"""

from elasticsearch import Elasticsearch
from elasticsearch.exceptions import NotFoundError

es = Elasticsearch("localhost:9299") 

repository_names = [
    "pending_repository",
    "pending_agr_repository",
    "pending_biggim_repository",
    "pending_biomuta_repository",
    "pending_ccle_repository",
    "pending_cell_ontology_repository",
    "pending_clinical_risk_kp_repository",
    "pending_denovodb_repository",
    "pending_dgidb_repository",
    "pending_diseases_repository",
    "pending_drug_response_kp_repository",
    "pending_ebi_gene2phenotype_repository",
    "pending_geneset1_repository",
    "pending_go_cc_repository",
    "pending_go_bp_repository",
    "pending_go_mf_repository",
    "pending_gwascatalog_repository",
    "pending_hpo_repository",
    "pending_idisk_repository",
    "pending_kaviar_repository",
    "pending_mgi_gene2phenotype_repository",
    "pending_mrcoc_repository",
    "pending_multiomics_wellness_kp_repository",
    "pending_pfocr_repository",
    "pending_phenotype_repository",
    "pending_phewas_repository",
    "pending_pseudocap_go_repository",
    "pending_repodb_repository",
    "pending_semmed_anatomy_repository",
    "pending_semmed_bp_repository",
    "pending_semmed_chemical_repository",
    "pending_semmeddb_repository",
    "pending_semmed_disease_repository",
    "pending_semmed_gene_repository",
    "pending_semmed_phenotype_repository",
    "pending_tcga_mut_freq_kp_repository",
    "pending_text_mining_co_occurrence_kp_repository",
    "pending_text_mining_targeted_association_repository",
    "pending_umlschem_repository",
    "pending_uberon_repository",
    "pending_upheno_ontology_repository"
]

ES6_BUCKET = 'biothings-es6-snapshots'
ES7_BUCKET = 'biothings-es7-snapshots'

for repo_name in repository_names:
    try: 
        repo_config = es.snapshot.get_repository(repository=repo_name)[repo_name]
        repo_bucket = repo_config.get("settings", {}).get("bucket", "")
        if repo_bucket == ES6_BUCKET:
            # print("Found {repo_name} with {repo_config}.".format(repo_name=repo_name, repo_config=repo_config))

            repo_config["settings"]["bucket"] = ES7_BUCKET

            # although the method is named "create_repository", it updates the existing repo with given config
            response = es.snapshot.create_repository(repository=repo_name, body=repo_config)

            print("Updated {repo_name} to {repo_config}. Got {response}.".format(repo_name=repo_name, repo_config=repo_config, response=response))
        else:
            print("Skipped {repo_name}.".format(repo_name=repo_name))
    except NotFoundError:
        print("Cannot find {repo_name}.".format(repo_name=repo_name))

Some repositories are not found in the NCATS ES (possibly not initialized), and they were just ignored in this task:

pending_biomuta_repository
pending_denovodb_repository
pending_geneset1_repository
pending_gwascatalog_repository
pending_kaviar_repository
pending_phewas_repository