epigen / mr.pareto

MR.PARETO - Modules & Recipes for Pragmatic Augmentation of Research Efficiency Towards Optimum
MIT License
15 stars 1 forks source link

adapt modules for snakemake 8 #13

Open burtonjake opened 1 month ago

burtonjake commented 1 month ago

To bump all modules to snakemake 8 (#12) we primarily need to document the libraries required to process the initial Snakemake file. These should be added to envs/global.yaml and cross-linked into the Snakemake file with a conda directive. See 'global workflow dependencies' in the snakemake docs. We also need to check the documentation for each module such that it makes sense for Snakemake8.

one-off (first in unsupervised_analysis)

for each module

Modules

burtonjake commented 1 month ago

All the libraries used across the current MR. PARETO Snakefiles:

unsupervised_analysis

import os
import sys
import pandas as pd
import yaml
from snakemake.utils import min_version

---

spilterlize_integrate

import yaml
import pandas as pd
import os
from snakemake.utils import validate, min_version
import json
import csv
import sys
import subprocess

---

dea_limma

import os
import sys
import pandas as pd
import yaml
from snakemake.utils import min_version

---

enrichment_analysis

import yaml
import pandas as pd
import os
from snakemake.utils import validate, min_version
import json
import csv
import sys
import subprocess

---

genome_tracks

# libraries
import pandas as pd
import os
import gzip
import re # for regular expressions
import numpy as np
import json
from snakemake.utils import min_version
import hashlib # generating unique sample names for single-cell samples

---

atacseq_pipeline

import yaml
import pandas as pd
import os
import shutil
from snakemake.utils import validate, min_version
from string import Template

---

scrnaseq_processing_seurat

import os
import sys
import pandas as pd
import yaml
from snakemake.utils import min_version

---

dea_seurat

import os
import sys
import pandas as pd
import yaml
from snakemake.utils import min_version

---

mixscape_seurat

import os
import sys
import pandas as pd
import yaml
from snakemake.utils import min_version

It seems like the only one's that are not from the Python Standard Library are pandas, numpy and pyyaml. We should at least document the numpy and pandas dependencies.

sreichl commented 1 month ago

Let's find out which packages actually come with the minimal snakemake installation (the one that would create a problem because of missing packages). e.g., is yaml/pyaml for sure in there? then we could make one global.yaml and just copy paste it into every module.

burtonjake commented 1 month ago

It is indeed:

(base) [jburton@d002 ~]$ mamba create --dry-run -c conda-forge -c bioconda -n snakemake8-mini snakemake-minimal

Looking for: ['snakemake-minimal']

conda-forge/noarch                                  15.9MB @  26.1MB/s  0.6s
conda-forge/linux-64                                36.7MB @  39.7MB/s  0.9s
bioconda/linux-64                                    5.6MB @   4.0MB/s  1.4s
bioconda/noarch                                      5.3MB @   3.7MB/s  1.4s
Transaction

  Prefix: /nobackup/lab_bock/users/jburton/miniconda3/envs/snakemake8-mini

  Updating specs:

   - snakemake-minimal

  Package                                   Version  Build                Channel           Size
──────────────────────────────────────────────────────────────────────────────────────────────────
  Install:
──────────────────────────────────────────────────────────────────────────────────────────────────

  + python_abi                                 3.12  4_cp312              conda-forge     Cached
  + _libgcc_mutex                               0.1  conda_forge          conda-forge     Cached
  + ld_impl_linux-64                           2.40  hf3520f5_7           conda-forge     Cached
  + ca-certificates                        2024.7.4  hbcca054_0           conda-forge     Cached
  + libgomp                                  14.1.0  h77fa898_0           conda-forge     Cached
  + _openmp_mutex                               4.5  2_gnu                conda-forge     Cached
  + libgcc-ng                                14.1.0  h77fa898_0           conda-forge     Cached
  + libgfortran5                             14.1.0  hc5f4f2c_0           conda-forge     Cached
  + libstdcxx-ng                             14.1.0  hc0a3c3a_0           conda-forge     Cached
  + openssl                                   3.3.1  h4bc722e_2           conda-forge     Cached
  + libzlib                                   1.3.1  h4ab18f5_1           conda-forge     Cached
  + libxcrypt                                4.4.36  hd590300_1           conda-forge     Cached
  + libffi                                    3.4.2  h7f98852_5           conda-forge     Cached
  + bzip2                                     1.0.8  h4bc722e_7           conda-forge     Cached
  + yaml                                      0.2.5  h7f98852_2           conda-forge     Cached
  + ncurses                                     6.5  h59595ed_0           conda-forge     Cached
  + libuuid                                  2.38.1  h0b41bf4_0           conda-forge     Cached
  + libnsl                                    2.0.1  hd590300_0           conda-forge     Cached
  + libexpat                                  2.6.2  h59595ed_0           conda-forge     Cached
  + xz                                        5.2.6  h166bdaf_0           conda-forge     Cached
  + libgfortran-ng                           14.1.0  h69a702a_0           conda-forge     Cached
  + zstd                                      1.5.6  ha6fb4c9_0           conda-forge     Cached
  + tk                                       8.6.13  noxft_h4845f30_101   conda-forge     Cached
  + libsqlite                                3.46.0  hde9e2c9_0           conda-forge     Cached
  + readline                                    8.2  h8228510_1           conda-forge     Cached
  + libopenblas                              0.3.27  pthreads_hac2b453_1  conda-forge     Cached
  + libblas                                   3.9.0  23_linux64_openblas  conda-forge     Cached
  + libcblas                                  3.9.0  23_linux64_openblas  conda-forge     Cached
  + liblapack                                 3.9.0  23_linux64_openblas  conda-forge     Cached
  + liblapacke                                3.9.0  23_linux64_openblas  conda-forge     Cached
  + coin-or-utils                           2.11.11  h8c65801_1           conda-forge     Cached
  + coin-or-osi                            0.108.10  haf5fa05_0           conda-forge     Cached
  + coin-or-clp                              1.17.8  h1ee7a9c_0           conda-forge     Cached
  + coin-or-cgl                              0.60.7  h516709c_0           conda-forge     Cached
  + coin-or-cbc                             2.10.11  h56f689f_0           conda-forge     Cached
  + tzdata                                    2024a  h0c530f3_0           conda-forge     Cached
  + coincbc                                 2.10.11  0_metapackage        conda-forge     Cached
  + python                                   3.12.4  h194c7f8_0_cpython   conda-forge     Cached
  + wheel                                    0.44.0  pyhd8ed1ab_0         conda-forge       59kB
  + setuptools                               72.1.0  pyhd8ed1ab_0         conda-forge        1MB
  + pip                                        24.2  pyhd8ed1ab_0         conda-forge     Cached
  + pyparsing                                 3.1.2  pyhd8ed1ab_0         conda-forge     Cached
  + pycparser                                  2.22  pyhd8ed1ab_0         conda-forge     Cached
  + platformdirs                              4.2.2  pyhd8ed1ab_0         conda-forge     Cached
  + hyperframe                                6.0.1  pyhd8ed1ab_0         conda-forge     Cached
  + smmap                                     5.0.0  pyhd8ed1ab_0         conda-forge     Cached
  + typing_extensions                        4.12.2  pyha770c72_0         conda-forge     Cached
  + zipp                                     3.19.2  pyhd8ed1ab_0         conda-forge     Cached
  + attrs                                    24.1.0  pyh71513ae_0         conda-forge       56kB
  + pkgutil-resolve-name                     1.3.10  pyhd8ed1ab_1         conda-forge     Cached
  + traitlets                                5.14.3  pyhd8ed1ab_0         conda-forge     Cached
  + python-fastjsonschema                    2.20.0  pyhd8ed1ab_0         conda-forge     Cached
  + charset-normalizer                        3.3.2  pyhd8ed1ab_0         conda-forge     Cached
  + hpack                                     4.0.0  pyh9f0ad1d_0         conda-forge     Cached
  + pysocks                                   1.7.1  pyha2e5f31_6         conda-forge     Cached
  + idna                                        3.7  pyhd8ed1ab_0         conda-forge     Cached
  + certifi                                2024.7.4  pyhd8ed1ab_0         conda-forge     Cached
  + plac                                      1.4.3  pyhd8ed1ab_0         conda-forge     Cached
  + argparse-dataclass                        2.0.0  pyhd8ed1ab_0         conda-forge     Cached
  + dpath                                     2.2.0  pyha770c72_0         conda-forge     Cached
  + throttler                                 1.2.2  pyhd8ed1ab_0         conda-forge     Cached
  + stopit                                    1.1.2  py_0                 conda-forge     Cached
  + reretry                                  0.11.8  pyhd8ed1ab_0         conda-forge     Cached
  + tabulate                                  0.9.0  pyhd8ed1ab_1         conda-forge     Cached
  + packaging                                  24.1  pyhd8ed1ab_0         conda-forge     Cached
  + humanfriendly                              10.0  pyhd8ed1ab_6         conda-forge     Cached
  + docutils                                 0.21.2  pyhd8ed1ab_0         conda-forge     Cached
  + configargparse                              1.7  pyhd8ed1ab_0         conda-forge     Cached
  + appdirs                                   1.4.4  pyh9f0ad1d_0         conda-forge     Cached
  + toposort                                   1.10  pyhd8ed1ab_0         conda-forge     Cached
  + connection_pool                           0.0.3  pyhd3deb0d_0         conda-forge     Cached
  + gitdb                                    4.0.11  pyhd8ed1ab_0         conda-forge     Cached
  + importlib_resources                       6.4.0  pyhd8ed1ab_0         conda-forge     Cached
  + h2                                        4.1.0  pyhd8ed1ab_0         conda-forge     Cached
  + amply                                     0.1.6  pyhd8ed1ab_0         conda-forge     Cached
  + gitpython                                3.1.43  pyhd8ed1ab_0         conda-forge     Cached
  + psutil                                    6.0.0  py312h9a8786e_0      conda-forge     Cached
  + markupsafe                                2.1.5  py312h98912ed_0      conda-forge     Cached
  + rpds-py                                  0.19.1  py312hf008fa9_0      conda-forge     Cached
  + brotli-python                             1.1.0  py312h30efb56_1      conda-forge     Cached
  + wrapt                                    1.16.0  py312h98912ed_0      conda-forge     Cached
  + pyyaml                                    6.0.1  py312h98912ed_1      conda-forge     Cached
  + datrie                                    0.8.2  py312h98912ed_7      conda-forge     Cached
  + immutables                                 0.20  py312h98912ed_1      conda-forge     Cached
  + cffi                                     1.16.0  py312hf06ca03_0      conda-forge     Cached
  + jupyter_core                              5.7.2  py312h7900ff3_0      conda-forge     Cached
  + pulp                                      2.8.0  py312h7900ff3_0      conda-forge     Cached
  + zstandard                                0.23.0  py312h3483029_0      conda-forge     Cached
  + snakemake-interface-common               1.17.2  pyhdfd78af_0         bioconda        Cached
  + snakemake-interface-storage-plugins       3.2.3  pyhdfd78af_0         bioconda        Cached
  + snakemake-interface-executor-plugins      9.2.0  pyhdfd78af_0         bioconda        Cached
  + snakemake-interface-report-plugins        1.0.0  pyhdfd78af_0         bioconda        Cached
  + jinja2                                    3.1.4  pyhd8ed1ab_0         conda-forge     Cached
  + referencing                              0.35.1  pyhd8ed1ab_0         conda-forge     Cached
  + smart_open                                7.0.4  pyhd8ed1ab_0         conda-forge     Cached
  + conda-inject                              1.3.2  pyhd8ed1ab_0         conda-forge     Cached
  + yte                                       1.5.4  pyha770c72_0         conda-forge     Cached
  + urllib3                                   2.2.2  pyhd8ed1ab_1         conda-forge     Cached
  + jsonschema-specifications             2023.12.1  pyhd8ed1ab_0         conda-forge     Cached
  + requests                                 2.32.3  pyhd8ed1ab_0         conda-forge     Cached
  + jsonschema                               4.23.0  pyhd8ed1ab_0         conda-forge     Cached
  + nbformat                                 5.10.4  pyhd8ed1ab_0         conda-forge     Cached
  + snakemake-minimal                        8.16.0  pyhdfd78af_0         bioconda        Cached

  Summary:

  Install: 103 packages

  Total download: 2MB
burtonjake commented 1 month ago

Additionally these are the numpy and pandas versions that come with the big snakemake8:

numpy                     2.0.1           py312h1103770_0    conda-forge
pandas                    2.2.2           py312h1d6d2e6_1    conda-forge
sreichl commented 1 month ago

I checked the RNA pipeline again and interestingly they only use the resource key once here: https://github.com/snakemake-workflows/rna-seq-star-deseq2/blob/993dcfcf3c1210f75f6bfb0ef765a4ddb77cadf7/workflow/rules/ref.smk#L51

in all other rules they only use the threads key.

Unsure what to do ie how to decouple resource specifications form workflow configuration meaningfully without adding complexity dor the enduser.