Closed mezarque closed 1 year ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
I just got trigger happy and hit submit on my review, but wanted to give a kudos to @mezarque! This is a good PR and I think you did a good job with the individual Python files. If we needed to, this would be fairly trivial to convert to Nextflow because of that. Good job
This project focuses on building "maps" of similar proteins for exploratory analysis.
About this repo
This repo uses Snakemake to orchestrate BLAST and Foldseek queries for user-provided protein FASTA and PDB files of interest. The rule graph currently looks like this:
The goal of the repo is to build out a pipeline that looks like this flowchart.
Most of the rules call Python scripts that are designed to be modular and which accept input and output arguments using
argparse
. The core functions underlying each script will be made accessible via an importable package in the future, so that a user who wants to can interactively run each underlying command themselves. The first iteration of functionality for this repo will hopefully work something likegget
which allows calls from the command line and from within a Python script.Some of these rules can probably be compressed to make the rule graph less complicated.
PR Details
This PR sets up the first step of the workflow, which performs BLAST and Foldseek queries via API.
All of the important code is in the
Snakefile
and theutils/
directory. Feedback on this code would be super appreciated!There are a bunch of files in the
notebooks/
directory which show what the repo will ultimately do using Snakemake, except using a Jupyter notebook.This folder will probably be completely reworked in the future to hold tutorial notebooks for interactively running the scripts, so no need to review it. I've left it in the repo in case someone wanted to check out what this repo will eventually do.
Known issues
rule aggregate_lists
is disconnected from its dependencies. I think I need to use acheckpoint
to make it wait until after the previous rules have been executed?blastp
into a Python script that usessubprocess