Open kelly-sovacool opened 2 months ago
as of v2.6.1, this no longer occurs with --version
but it does with run
.
renee run \
--input /data/CCBR_Pipeliner/Pipelines/RENEE/develop/.tests/*.R?.fastq.gz \
--output /data/$USER/renee_test_rel-7 \
--genome hg38_30 \
--mode slurm \
--sif-cache /data/CCBR_Pipeliner/SIFS
[+] Loading singularity 4.1.5 on cn4312
[+] Loading snakemake 7.32.4
Python version: 3.11.3
RENEE (v2.6.1)
Thank you for running RENEE on BIOWULF!
Generating config file in '/data/sovacoolkl/renee_test_rel-7/config.json'... Done!
/data/sovacoolkl/renee_test_rel-7/resources/runner slurm -j pl:renee -b /gpfs/gsfs10/users/CCBR_Pipeliner,/data/CCBR_Pipeliner,/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/RENEE/develop/.tests,/data/sovacoolkl/renee_test_rel-7,/lscratch -o /data/sovacoolkl/renee_test_rel-7 -c /data/sovacoolkl/renee_test_rel-7/.singularity -t /lscratch/$SLURM_JOBID -n biowulf
Successfully submitted master job: 39979651
sys:1: ResourceWarning: unclosed <socket.socket fd=4, family=2, type=1, proto=0, laddr=('0.0.0.0', 0)>
ResourceWarning: Enable tracemalloc to get the object allocation traceback
@kelly-sovacool happens with help commands as well
renee run --help
[+] Loading singularity 4.1.5 on cn4294
[+] Loading snakemake 7.32.4
Python version: 3.11.3
renee run: Runs the data-processing and quality-control pipeline.
Synopsis:
$ renee run [--help] \
[--small-rna] [--star-2-pass-basic] \
[--dry-run] [--mode {slurm, local}] \
[--shared-resources SHARED_RESOURCES] \
[--singularity-cache SINGULARITY_CACHE] \
[--sif-cache SIF_CACHE] \
[--tmp-dir TMP_DIR] \
[--wait] \
[--create-nidap-folder] \
[--threads THREADS] \
--input INPUT [INPUT ...] \
--output OUTPUT \
--genome {hg38_36, mm10_M21, custom.json, ...}
Description:
To run the pipeline with with your data, please provide a space separated
list of FastQs (globbing is supported), an output directory to store results,
and a reference genome.
Optional arguments are shown in square brackets above. Please visit our docs
at "https://CCBR.github.io/RENEE/" for more information, examples, and
guides.
Required arguments:
--input INPUT [INPUT ...]
Input FastQ file(s) to process. One or more FastQ files
can be provided. The pipeline supports single-end and
pair-end RNA-seq data.
Example: --input .tests/*.R?.fastq.gz
--output OUTPUT
Path to an output directory. This location is where
the pipeline will create all of its output files, also
known as the pipeline's working directory. If the user
provided working directory has not been initialized,
it will be created automatically.
Example: --output /data/$USER/RNA_hg38
--genome {hg38_36,mm10_M21,custom.json,...}
Reference genome. This option defines the reference
genome of the samples. The default is hg38_36 if not specifies.
RENEE on biowulf comes bundled with
pre-built reference files for human and mouse samples;
however, it is worth noting that the pipeline can accept
custom reference genomes created with the build sub
command. Run `renee --help` to view the current list of pre-built genomes.
A custom reference genome created with
the build sub command can also be provided. The name of
this custom reference JSON file is dependent on the
values provided to the following renee build args
'--ref-name REF_NAME --gtf-ver GTF_VER', where the name
of the output file uses the following naming convention:
'{REF_NAME}_{GTF_VER}.json'.
Example: --genome hg38_36
Analysis options:
--small-rna Uses ENCODE's recommendations for small RNA. This
option should be used with small RNA libraries. These
are rRNA-depleted libraries that have been size
selected to be shorter than 200bp. Size selection
enriches for small RNA species such as miRNAs, siRNAs,
or piRNAs. This option is only supported with single-
end data. This option should not be combined with the
star 2-pass basic option.
Example: --small-rna
--star-2-pass-basic Run STAR in per sample 2-pass mapping mode. It is
recommended to use this option when processing a set
of unrelated samples. It is not adivsed to use this
option for a study with multiple related samples. By
default, the pipeline ultilizes a multi sample 2-pass
mapping approach where the set of splice junctions
detected across all samples are provided to the second
pass of STAR. This option overrides the default
behavior so each sample will be processed in a per
sample two-pass basic mode. This option should not be
combined with the small RNA option.
Example: --star-2-pass-basic
Orchestration options:
--dry-run Does not execute anything. Only displays what steps in
the pipeline remain or will be run.
Example: --dry-run
--mode {slurm,local}
Method of execution. Defines the mode of execution.
Valid options for this mode include: local or slurm.
Additional modes of execution are coming soon, default:
slurm.
Here is a brief description of each mode:
• local: uses local method of execution. local runs
will run serially on compute instance. This is useful
for testing, debugging, or when a users does not have
access to a high performance computing environment.
If this option is not provided, it will default to a
slurm mode of execution.
• slurm: uses slurm execution backend. This method
will submit jobs to a cluster using sbatch. It is
recommended running the pipeline in this mode as it
will be significantly faster.
Example: --mode slurm
--shared-resources SHARED_RESOURCES
Local path to shared resources. The pipeline uses a set
of shared reference files that can be re-used across ref-
erence genomes. These currently include reference files
for kraken and FQScreen. These reference files can be
downloaded with the build sub command's --shared-resources
option. These files only need to be downloaded once. If
you are running the pipeline on Biowulf, you do NOT need
to download these reference files! They already exist on
the filesystem in a location that anyone can access. If
you are running the pipeline on another cluster or target
system, you will need to download the shared resources
with the build sub command, and you will need to provide
this option to the run sub command every time. Please
provide the same path that was provided to the build sub
command's --shared-resources option.
Example: --shared-resources /data/shared/renee
--singularity-cache SINGULARITY_CACHE
Overrides the $SINGULARITY_CACHEDIR variable. Images
from remote registries are cached locally on the file
system. By default, the singularity cache is set to:
'/path/to/output/directory/.singularity/'. Please note
that this cache cannot be shared across users.
Example: --singularity-cache /data/$USER
--sif-cache SIF_CACHE
Path where a local cache of SIFs are stored. This cache
can be shared across users if permissions are properly
setup. If a SIF does not exist in the SIF cache, the
image will be pulled from Dockerhub. renee cache
sub command can be used to create a local SIF cache.
Please see renee cache for more information.
Example: --sif-cache /data/$USER/sifs/
--wait
Wait until master job completes. This is required if
the job is submitted using HPC API. If not provided
the API may interpret submission of master job as
completion of the pipeline!
--create-nidap-folder
Create folder called "NIDAP" with file to-be-moved back to NIDAP
This makes it convenient to move only this folder (called NIDAP)
and its content back to NIDAP, rather than the entire pipeline
output folder.
--tmp-dir TMP_DIR
Path on the file system for writing temporary output
files. By default, the temporary directory is set to
'/lscratch/$SLURM_JOBID' on NIH's Biowulf cluster and
'OUTPUT' on the FRCE cluster.
However, if you are running the pipeline on another cluster,
this option will need to be specified.
Ideally, this path should point to a dedicated location on
the filesystem for writing tmp files.
On many systems, this location is
set to somewhere in /scratch. If you need to inject a
variable into this string that should NOT be expanded,
please quote this options value in single quotes.
Example: --tmp-dir '/cluster_scratch/$USER/'
--threads THREADS
Max number of threads for local processes. It is
recommended setting this value to the maximum number
of CPUs available on the host machine, default: 2.
Example: --threads: 16
Misc Options:
-h, --help Show usage information, help message, and exit.
Example: --help
options:
--create-nidap-folder
Create folder called "NIDAP" with file to-be-moved back to NIDAP This makes it convenient to move only this folder (called NIDAP) and
its content back to NIDAP, rather than the entire pipeline output folder
--wait Wait until master job completes. This is required if the job is submitted using HPC API. If not provided the API may interpret
submission of master job as completion of the pipeline!
Example:
# Step 1.) Grab an interactive node,
# do not run on head node and add
# required dependencies to $PATH
srun -N 1 -n 1 --time=1:00:00 --mem=8gb --cpus-per-task=2 --pty bash
module purge
module load singularity snakemake
# Step 2A.) Dry run pipeline with provided test data
./renee run --input .tests/*.R?.fastq.gz \
--output /data/$USER/RNA_hg38 \
--genome hg38_36 \
--mode slurm \
--dry-run
# Step 2B.) Run RENEE pipeline
# The slurm mode will submit jobs to the cluster.
# It is recommended running renee in this mode.
./renee run --input .tests/*.R?.fastq.gz \
--output /data/$USER/RNA_hg38 \
--genome hg38_36 \
--mode slurm
Ver:
v2.6.2
Prebuilt genome+annotation combos:
['hg19_19', 'hg19_36', 'hg38_30', 'hg38_34', 'hg38_36', 'hg38_38', 'hg38_41', 'hg38_45', 'mm10_M21', 'mm10_M23', 'mm10_M25', 'mmul10_mmul10_108']
sys:1: ResourceWarning: unclosed <socket.socket fd=4, family=2, type=1, proto=0, laddr=('0.0.0.0', 0)>
ResourceWarning: Enable tracemalloc to get the object allocation traceback
@kelly-sovacool can you try using the following in python in order to get to the source of this issue:
warnings.simplefilter("default", ResourceWarning)
tracemalloc.start()
warning message with v2.6:
this doesn't happen with v2.5:
It seems there's an opened file that never gets closed? https://stackoverflow.com/a/61373209