EBISPOT / gwas-solr-slim

1 stars 0 forks source link

Slim Solr Project Readme Documentation

Overview

The Slim Solr project is a specialized Solr index for GWAS catalog data. It generates JSON documents essential for the GWAS Slim Solr index, including Publication, Study, Variant, Trait and Gene.

Features

  1. JSON Document Generation: The generate_solr_docs.py script is designed to create JSON documents representing key information needed for the GWAS Slim Solr index. It efficiently handles the generation of documents for Publication, Study, Variant, Trait and Gene.

  2. Flexible Usage: The script includes command-line options to specify database and data type parameters, allowing for tailored usage depending on specific project requirements.

  3. Automated Document Creation: Facilitates the automated creation of Solr index documents, streamlining the process of indexing GWAS data.

  4. Integration with GWAS Solr Slim Deployment: Seamlessly integrates with the existing deployment and management processes, including conda environment activation and Bamboo automation pipelines for Slim Solr release tasks.

Requirements

Deployment Steps

The deployment of the GWAS Solr Slim project to High-Performance Computing (HPC) clusters involves a series of steps, managed through a CI/CD pipeline defined in .gitlab-ci.yml. This process is critical for ensuring that the codebase, hosted on GitHub and mirrored internally on GitLab, is efficiently and securely deployed.

Continuous Integration/Deployment

Pipeline Stages

  1. Build Stage: Involves compiling the code and building the Docker image.
  2. Deploy Stage: Focuses on deploying the built image to the HPC cluster.

Pipeline Configuration

Running the Project Locally

Prerequisites

Steps to Run Locally

  1. Clone the Repository: Start by cloning the Slim Solr project from GitHub.

    git clone https://github.com/EBISPOT/gwas-solr-slim.git
    cd gwas-solr-slim
  2. Create and Activate Virtual Environment:

    • For creation:

      python3 -m venv .venv
    • Activation (choose according to your platform):

      • POSIX systems (Linux, MacOS, etc.):
      source .venv/bin/activate
      • Windows Command Prompt:
      .venv\Scripts\activate.bat
      • Windows PowerShell:
      .venv\Scripts\Activate.ps1
  3. Install Dependencies:

    • Install the GWAS database connection module from the private GitLab repository and other required dependencies:

      pip install git+ssh://git@gitlab.ebi.ac.uk/gwas/gwas_db_connect.git
      pip install -r requirements.txt
  4. Test the Script:

    • Verify the script setup:

      python scripts/generate_solr_docs.py --help
    • Try running the script with specified parameters (replace <some/dir/> with your target directory):

      python scripts/generate_solr_docs.py --database DEV3 --limit 1 --test --targetDir <some/dir/>

How to Contribute

Contribution Process

Release Process