Luke-ebbis / neuralplexer-workflow

A workflow for predicting ligands and protein interactions with NeuralPlexer
Apache License 2.0
0 stars 0 forks source link

NeuralPlexer workflow

Predicting the interaction between ligands and proteins with NeuralPlexer, a deeplearning tool by Qiao et al. 2024.

Installation

The following commands install dependencies of the workflow in the current directory within the .pixi folder. After installation, you cannot move the folder without re-installling all the dependencies.

curl -fsSL https://pixi.sh/install.sh | bash
# ... cd <this repo>
pixi install

Usage

Each job has a json descriptor in the data/ folder:

[
  {
    "name": "2024-05-13_20:16",
    "parameters": {
      "sampler":"langevin_simulated_annealing",
      "n-samples": 10,
      "chunk-size": 1,
      "num-steps": 40
    },
    "ligands" : [
       {
        "molecule": {
          "sdf": "data/lig_ref.sdf",
          "count": 2
        }
      }
    ],
    "sequences": [
      {
        "proteinChain": {
          "sequence": "FGGGFGGGGGSGSGSGG",
          "count": 2
        }
      },
      {
        "proteinChain": {
          "sequence": "FGGSGSGSGG",
          "count": 1
        }
      }
    ]
  }
]

The ligands are included like this.

data
├── lig_ref.sdf
└── test.json

This will make a folder in results/data/<jobname> with each job in it. Right now, the token limit seems to be 600 amino acid residues. At this point, the graphical card runs out of memory.

Usage in a HPC context is done with the

pixi run slurm

command, this will launch the hardware intensive snakemake tasks as SLURM jobs.

About

When pixi run make is executed, the snakemake pipeline sets up neuralplexer and predicts each listed complex in data. Additionally, summary statistics are calculated and plotted as shown below. These summary statistics end up in results/analysis. Statistics and analysis include: