RobertsLab / resources

https://robertslab.github.io/resources/
19 stars 11 forks source link

Slurm script throwing error #2019

Closed sr320 closed 2 weeks ago

sr320 commented 2 weeks ago

Running the job below...
error files read out

[sr320@klone-login03 02-bismark-klone-array]$ cat bismark_array_21906545_12.err
find: ‘/gscratch/srlab/containers/.ssh’: Permission denied
find: ‘/gscratch/srlab/containers/.local/share/rstudio/sessions/active/session-a0a281fd/viewer-cache’: Permission denied
find: ‘/gscratch/srlab/containers/.local/share/nano’: Permission denied

job file

#!/bin/sh

#SBATCH --job-name=bismark_array          # Job name
#SBATCH --output=%x_%A_%a.out             # Standard output and error log
#SBATCH --error=%x_%A_%a.err              # Error log
#SBATCH --account=srlab
#SBATCH --partition=ckpt #update this line - use hyakalloc to find partitions you can use
#SBATCH --time=04-02:00:00
#SBATCH --ntasks=1                        # Run a single task
#SBATCH --cpus-per-task=30                # Number of CPU cores per task
#SBATCH --array=0-47                      # Array range (adjust based on the number of samples)
#SBATCH --mem=100G                         # Memory per node
#SBATCH --chdir=/gscratch/scrubbed/sr320/github/project-mytilus-methylation/output/02-bismark-klone-array/

# Exit script if any command fails
set -e

# Get most recent container git hash
git_commit_hash=$(find /gscratch/srlab/containers/ \
-name "srlab-bioinformatics-container*" \
-printf "%T+ %p\n" \
| sort -n \
| awk -F[-.] 'NR == 1 {print $7}')

# Load modules
module load apptainer

# Execute Roberts Lab bioinformatics container
# Binds home directory
# Binds /gscratch directory
# Directory bindings allow outputs to be written to the hard drive.
apptainer exec \
--home "$PWD" \
--bind /mmfs1/home/ \
--bind /mmfs1/gscratch/ \
/gscratch/srlab/containers/srlab-bioinformatics-container-"${git_commit_hash}$".sif \

# Set directories and files
reads_dir="/gscratch/scrubbed/sr320/github/project-mytilus-methylation/data/raw-wgbs/"
#bismark_dir="/path/to/bismark/"
#bowtie2_dir="/path/to/bowtie2/"
genome_folder="/gscratch/scrubbed/sr320/github/project-mytilus-methylation/output/01-bismark-init/"
output_dir="/gscratch/scrubbed/sr320/github/project-mytilus-methylation/output/02-bismark-klone-array/"
checkpoint_file="/gscratch/scrubbed/sr320/github/project-mytilus-methylation/output/02-bismark-klone-array/completed_samples.log"

# Create the checkpoint file if it doesn't exist
touch ${checkpoint_file}

# Get the list of sample files and corresponding sample names
files=(${reads_dir}*_1.fastq.gz)
file=${files[$SLURM_ARRAY_TASK_ID]}
sample_name=$(basename "$file" _1.fastq.gz)

# Check if the sample has already been processed
if grep -q "^${sample_name}$" ${checkpoint_file}; then
    echo "Sample ${sample_name} already processed. Skipping..."
    exit 0
fi

# Define log files for stdout and stderr
stdout_log="${output_dir}${sample_name}_stdout.log"
stderr_log="${output_dir}${sample_name}_stderr.log"

# Run Bismark for this sample
bismark \
    -genome ${genome_folder} \
    -p 30 \
    -score_min L,0,-0.6 \
    --non_directional \
    -1 ${reads_dir}${sample_name}_1.fastq.gz \
    -2 ${reads_dir}${sample_name}_2.fastq.gz \
    -o ${output_dir} \
    > ${stdout_log} 2> ${stderr_log}

# Check if the command was successful
if [ $? -eq 0 ]; then
    # Append the sample name to the checkpoint file
    echo ${sample_name} >> ${checkpoint_file}
    echo "Sample ${sample_name} processed successfully."
else
    echo "Sample ${sample_name} failed. Check ${stderr_log} for details."
fi

permission issue. but not sure how to resolve

kubu4 commented 2 weeks ago

I believe this is likely related to trying to simultaneously use a container in the same directory.

Permissions for those files show me as the owner, and nogroup as the group. Additionally, they show as being last updated this afternoon. To me this indicates they're currently "in use," so the permissions likely get set to prevent the container environment from getting screwed up.

I'd try copying whichever container you're using to a different directory and then try to launch from that new location.

kubu4 commented 2 weeks ago

Although, I just noticed this line:

/gscratch/srlab/containers/srlab-bioinformatics-container-"${git_commit_hash}$".sif \

Two things I notice:

  1. Possibly errant $: ${git_commit_hash}$ - I don't think there should be a dollar sign at the end.
  2. Errant continuation slash :.sif \ - There isn't another line/command after this, so this will lead to problems.
sr320 commented 2 weeks ago

probably the .sif \ - I was using examples where it pointed to another script as opposed to have code in job.. 🤞 will try again

sr320 commented 2 weeks ago

now getting

/var/spool/slurmd/job21906836/slurm_script: line 68: bismark: command not found

kubu4 commented 2 weeks ago

Hmm, I get this when looking for Bismark:

image

Which container commit are you using? There was a version where the $PATH was broken. Are you possibly using that one?

sr320 commented 2 weeks ago

using /gscratch/srlab/sr320/srlab-bioinformatics-container-586bf21.sif

sr320 commented 2 weeks ago

also is /srlab a location? see line after which bismark..

kubu4 commented 2 weeks ago

also is /srlab a location? see line after which bismark..

Yes. That's where programs are installed in the container.

Regarding the bismark issue, I think I finally realized what's happening.

If you're going to use the apptainer exec command, then you virtually have to pass it a script (which would contain all of your bismark commands). It will not handle multiple commands passed to it.

So, your options are:

  1. Run RStudio Server and operate in that environment.
  2. Put all of your bismark commands into a bash script (make the file executable) and then update your apptainer exec command to call that script.

E.g.

apptainer exec \
--home "$PWD" \
--bind /mmfs1/home/ \
--bind /mmfs1/gscratch/ \
/gscratch/srlab/containers/srlab-bioinformatics-container-"${git_commit_hash}".sif \
mybismark-script.sh

So, you're getting the bismark error because it's looking for bismark on Klone - because your apptainer exec command finished (even though you didn't provide it with a command to run) and then the SLURM script moved on to the next part which was your bismark command. Thus, the error.

EDITED: Fixed typo.

sr320 commented 2 weeks ago

so now with


# Load modules
module load apptainer

# Execute Roberts Lab bioinformatics container
# Binds home directory
# Binds /gscratch directory
# Directory bindings allow outputs to be written to the hard drive.
apptainer exec \
--home "$PWD" \
--bind /mmfs1/home/ \
--bind /mmfs1/gscratch/ \
/gscratch/srlab/sr320/srlab-bioinformatics-container-586bf21.sif \
/gscratch/scrubbed/sr320/github/project-mytilus-methylation/code/02.1.sh

my error:

[sr320@klone-login03 02-bismark-klone-array]$ cat *err
/var/spool/slurmd/job21909895/slurm_script: line 17: module: command not found
FATAL:   stat /gscratch/scrubbed/sr320/github/project-mytilus-methylation/code/02.1.sh: no such file or directory
[sr320@klone-login03 02-bismark-klone-array]$ ls  /gscratch/scrubbed/sr320/github/project-mytilus-methylation/code/02.1.sh
/gscratch/scrubbed/sr320/github/project-mytilus-methylation/code/02.1.sh

note line 17 in slurm is module load apptainer

I think I am having a conceptional issue understanding where things are...

kubu4 commented 2 weeks ago

Out of curiosity, why don't you just fire up an RStudio instance?

sr320 commented 2 weeks ago

The array- I presumed I could not call - 40 nodes from within Rstudio

Bismarck does run inside Rstudio- I am trying to run samples across ckpt nodes.

On Thu, Nov 7, 2024 at 8:40 PM kubu4 @.***> wrote:

Out of curiosity, why don't you just fire up an RStudio instance?

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2019*issuecomment-2463752692__;Iw!!K-Hz7m0Vt54!gHckgssIHljcmLHVMXEcqjdDUHCNMK2K6FELFq77B2ARVWhZoH5upcD8es-BPst-G5_xRhCbZL9F4HwTDD96cgI$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABB4PN3AUOW23KIQYZV3LR3Z7Q6FPAVCNFSM6AAAAABRMK4V7WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRTG42TENRZGI__;!!K-Hz7m0Vt54!gHckgssIHljcmLHVMXEcqjdDUHCNMK2K6FELFq77B2ARVWhZoH5upcD8es-BPst-G5_xRhCbZL9F4HwTs2Nspz0$ . You are receiving this because you authored the thread.Message ID: @.***>

kubu4 commented 2 weeks ago

I see.

kubu4 commented 2 weeks ago

Okay, I did some testing and things seem to be working okay for me.

As the error message indicates (/var/spool/slurmd/job21909895/slurm_script: line 17: module: command not found), it can't find the module command.

SOLUTION: Remove that line. Seems like apptainer is already in the system $PATH, so no need for that line.

Regarding this error, I'm not sure. It's odd:

FATAL: stat /gscratch/scrubbed/sr320/github/project-mytilus-methylation/code/02.1.sh: no such file or directory

Otherwise, here's what I'm running and seems to work.


Bismark script (bismark.sh):

#!/bin/bash
echo "Going to try to exectute bismark..."
echo ""
bismark --help

SLURM Job script (job.sh):

#!/bin/sh
#SBATCH --job-name=bismark_array          # Job name
#SBATCH --output=%x_%A_%a.out             # Standard output and error log
#SBATCH --error=%x_%A_%a.err              # Error log
#SBATCH --account=srlab
#SBATCH --partition=ckpt #update this line - use hyakalloc to find partitions you can use
#SBATCH --time=04-02:00:00
#SBATCH --ntasks=1                        # Run a single task
#SBATCH --cpus-per-task=30                # Number of CPU cores per task
#SBATCH --array=0-47                      # Array range (adjust based on the number of samples)
#SBATCH --mem=100G   
#SBATCH --chdir=/mmfs1/home/samwhite/container_test

apptainer exec \
--home $PWD \
--bind /mmfs1/home/ \
--bind /mmfs1/gscratch/ \
/gscratch/srlab/containers/srlab-bioinformatics-container-f4142f4.sif \
~/container_test/bismark.sh

When I launch the job script, this is what happens:

Screenshot_20241108_070109

Exciting!!

Then, here's the output directory:

image

And, finally, we see that the bismark.sh script executed correctly:

image

sr320 commented 2 weeks ago

Getting closer!

Part of issue was conflict with working directory and setting absolute paths in script, also my script need to be !#bash not sh.., now just dealing with code issues.

On Fri, Nov 8, 2024 at 7:21 AM kubu4 @.***> wrote:

Okay, I did some testing and things seem to be working okay for me.

As the error message indicates (/var/spool/slurmd/job21909895/slurm_script: line 17: module: command not found), it can't find the module command.

SOLUTION: Remove that line. Seems like apptainer is already in the system $PATH, so no need for that line.

Regarding this error, I'm not sure. It's odd:

FATAL: stat /gscratch/scrubbed/sr320/github/project-mytilus-methylation/code/02.1.sh: no such file or directory

Otherwise, here's what I'm running and seems to work.

Bismark script (bismark.sh):

!/bin/bashecho "Going to try to exectute bismark..."echo ""

bismark --help


SLURM Job script (job.sh):

!/bin/sh#SBATCH --job-name=bismarkarray # Job name#SBATCH --output=%x%A%a.out # Standard output and error log#SBATCH --error=%x%A_%a.err # Error log#SBATCH --account=srlab#SBATCH --partition=ckpt #update this line - use hyakalloc to find partitions you can use#SBATCH --time=04-02:00:00#SBATCH --ntasks=1 # Run a single task#SBATCH --cpus-per-task=30 # Number of CPU cores per task#SBATCH --array=0-47 # Array range (adjust based on the number of samples)#SBATCH --mem=100G #SBATCH --chdir=/mmfs1/home/samwhite/container_test

apptainer exec \ --home $PWD \ --bind /mmfs1/home/ \ --bind /mmfs1/gscratch/ \ /gscratch/srlab/containers/srlab-bioinformatics-container-f4142f4.sif \~/container_test/bismark.sh

When I launch the job script, this is what happens:

Screenshot_20241108_070109.png (view on web) https://urldefense.com/v3/__https://github.com/user-attachments/assets/f025bbd9-9535-40c0-8926-b2320e3615d2__;!!K-Hz7m0Vt54!kARwL9oQpvt93YgSEJA-HEktVVBmILcz5Wlz5SpZ39hUfkNBxNMh6dZCt43tu3LYOrwadxQNF5DcdcV35S1MD1Q$

Exciting!!

Then, here's the output directory:

image.png (view on web) https://urldefense.com/v3/__https://github.com/user-attachments/assets/78a67df4-6447-4764-97c2-66c423edaee8__;!!K-Hz7m0Vt54!kARwL9oQpvt93YgSEJA-HEktVVBmILcz5Wlz5SpZ39hUfkNBxNMh6dZCt43tu3LYOrwadxQNF5DcdcV3sjJ_Lx4$

And, finally, we see that the bismark.sh script executed correctly:

image.png (view on web) https://urldefense.com/v3/__https://github.com/user-attachments/assets/4e70b6ba-f0f9-4672-9dd6-5e1ff5a8283a__;!!K-Hz7m0Vt54!kARwL9oQpvt93YgSEJA-HEktVVBmILcz5Wlz5SpZ39hUfkNBxNMh6dZCt43tu3LYOrwadxQNF5DcdcV3IZpYbxc$

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2019*issuecomment-2465023221__;Iw!!K-Hz7m0Vt54!kARwL9oQpvt93YgSEJA-HEktVVBmILcz5Wlz5SpZ39hUfkNBxNMh6dZCt43tu3LYOrwadxQNF5DcdcV34b-sZaA$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABB4PNY45JCRZ6P2VNGUUO3Z7TJHTAVCNFSM6AAAAABRMK4V7WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRVGAZDGMRSGE__;!!K-Hz7m0Vt54!kARwL9oQpvt93YgSEJA-HEktVVBmILcz5Wlz5SpZ39hUfkNBxNMh6dZCt43tu3LYOrwadxQNF5DcdcV3vyQHUkU$ . You are receiving this because you authored the thread.Message ID: @.***>

sr320 commented 2 weeks ago

running!

[sr320@klone-login03 02-bismark-klone-array]$ squeue | grep sr320
        21942225_0      ckpt bismark_    sr320  R    3:59:51      1 n3090
        21942225_1      ckpt bismark_    sr320  R    3:59:51      1 n3175
        21942225_2      ckpt bismark_    sr320  R    3:59:51      1 n3177
        21942225_3      ckpt bismark_    sr320  R    3:59:51      1 n3182
        21942225_4      ckpt bismark_    sr320  R    3:59:51      1 n3199
        21942225_5      ckpt bismark_    sr320  R    3:59:51      1 n3204
        21942225_6      ckpt bismark_    sr320  R    3:59:51      1 n3205
        21942225_7      ckpt bismark_    sr320  R    3:59:51      1 n3206
        21942225_8      ckpt bismark_    sr320  R    3:59:51      1 n3208
        21942225_9      ckpt bismark_    sr320  R    3:59:51      1 n3209
       21942225_10      ckpt bismark_    sr320  R    3:59:51      1 n3211
       21942225_11      ckpt bismark_    sr320  R    3:59:51      1 n3213
       21942225_12      ckpt bismark_    sr320  R    3:59:51      1 n3214
       21942225_13      ckpt bismark_    sr320  R    3:59:51      1 n3215
       21942225_14      ckpt bismark_    sr320  R    3:59:51      1 n3216
       21942225_15      ckpt bismark_    sr320  R    3:59:51      1 n3217
       21942225_16      ckpt bismark_    sr320  R    3:59:51      1 n3218
       21942225_17      ckpt bismark_    sr320  R    3:59:51      1 n3219
       21942225_18      ckpt bismark_    sr320  R    3:59:51      1 n3220
       21942225_19      ckpt bismark_    sr320  R    3:59:51      1 n3221
       21942225_20      ckpt bismark_    sr320  R    3:59:51      1 n3222
       21942225_21      ckpt bismark_    sr320  R    3:59:51      1 n3223
       21942225_22      ckpt bismark_    sr320  R    3:59:51      1 n3224
       21942225_23      ckpt bismark_    sr320  R    3:59:51      1 n3226
        21941915_6      ckpt bismark_    sr320  R    4:01:22      1 n3073
        21941915_7      ckpt bismark_    sr320  R    4:01:22      1 n3080
        21941915_9      ckpt bismark_    sr320  R    4:01:22      1 n3144
       21941915_10      ckpt bismark_    sr320  R    4:01:22      1 n3174
       21941915_12      ckpt bismark_    sr320  R    4:01:22      1 n3176
       21941915_14      ckpt bismark_    sr320  R    4:01:22      1 n3178
       21941915_16      ckpt bismark_    sr320  R    4:01:22      1 n3184
       21941915_17      ckpt bismark_    sr320  R    4:01:22      1 n3191
       21941915_18      ckpt bismark_    sr320  R    4:01:22      1 n3194
       21941915_19      ckpt bismark_    sr320  R    4:01:22      1 n3195
       21941915_20      ckpt bismark_    sr320  R    4:01:22      1 n3196
       21941915_21      ckpt bismark_    sr320  R    4:01:22      1 n3198
       21941915_23      ckpt bismark_    sr320  R    4:01:22      1 n3202
          21901892 cpu-g2-me rstudio-    sr320  R 1-03:20:00      1 n3441

will post in nb to try to pick apart but in mean time job:

#!/bin/sh

#SBATCH --job-name=bismark_array          # Job name
#SBATCH --output=%x_%A_%a.out             # Standard output and error log
#SBATCH --error=%x_%A_%a.err              # Error log
#SBATCH --account=srlab
#SBATCH --partition=ckpt #update this line - use hyakalloc to find partitions you can use
#SBATCH --time=04-02:00:00
#SBATCH --ntasks=1                        # Run a single task
#SBATCH --cpus-per-task=30                # Number of CPU cores per task
#SBATCH --array=0-47                      # Array range (adjust based on the number of samples)
#SBATCH --mem=100G                         # Memory per node
#SBATCH --chdir=/gscratch/scrubbed/sr320/github/project-mytilus-methylation

# Execute Roberts Lab bioinformatics container
# Binds home directory
# Binds /gscratch directory
# Directory bindings allow outputs to be written to the hard drive.
apptainer exec \
--home "$PWD" \
--bind /mmfs1/home/ \
--bind /mmfs1/gscratch/ \
/gscratch/srlab/sr320/srlab-bioinformatics-container-586bf21.sif \
code/02.1.sh

script:

#!/bin/bash
# Set directories and files
reads_dir="data/raw-wgbs/"
#bismark_dir="/path/to/bismark/"
#bowtie2_dir="/path/to/bowtie2/"
genome_folder="output/01-bismark-init/"
output_dir="output/02-bismark-klone-array/"
checkpoint_file="output/02-bismark-klone-array/completed_samples.log"

# Create the checkpoint file if it doesn't exist
touch ${checkpoint_file}

# Get the list of sample files and corresponding sample names
files=(${reads_dir}*_1.fastq.gz)
file="${files[$SLURM_ARRAY_TASK_ID]}"
sample_name=$(basename "$file" "_1.fastq.gz")

# Check if the sample has already been processed
if grep -q "^${sample_name}$" ${checkpoint_file}; then
    echo "Sample ${sample_name} already processed. Skipping..."
    exit 0
fi

# Define log files for stdout and stderr
stdout_log="${output_dir}${sample_name}_stdout.log"
stderr_log="${output_dir}${sample_name}_stderr.log"

# Run Bismark for this sample
bismark \
    -genome ${genome_folder} \
    -p 30 \
    -score_min L,0,-0.6 \
    --non_directional \
    -1 ${reads_dir}${sample_name}_1.fastq.gz \
    -2 ${reads_dir}${sample_name}_2.fastq.gz \
    -o ${output_dir} \
    > ${stdout_log} 2> ${stderr_log}

# Check if the command was successful
if [ $? -eq 0 ]; then
    # Append the sample name to the checkpoint file
    echo ${sample_name} >> ${checkpoint_file}
    echo "Sample ${sample_name} processed successfully."
else
    echo "Sample ${sample_name} failed. Check ${stderr_log} for details."
fi

note that location of 02.1.sh and directory locations therein are relative to #SBATCH --chdir=/gscratch/scrubbed/sr320/github/project-mytilus-methylation

sr320 commented 2 weeks ago

maybe related to ?

--home "$PWD" \ - will have to investigae

sr320 commented 2 weeks ago

This code is a command to run a script within an Apptainer (formerly Singularity) container. Here’s a breakdown of each part:

  1. apptainer exec: This part of the command tells Apptainer to execute a command within a specified container.
  2. --home "$PWD": The --home flag is setting the current working directory ($PWD) as the home directory for the Apptainer container during execution. $PWD is a shell variable that holds the current directory path.
  3. --bind /mmfs1/home/ and --bind /mmfs1/gscratch/: The --bind flags are used to mount directories from the host system into the container. Here, /mmfs1/home/ and /mmfs1/gscratch/ are mounted to make them accessible within the container.
  4. /gscratch/srlab/sr320/srlab-bioinformatics-container-586bf21.sif: This is the path to the container file (srlab-bioinformatics-container-586bf21.sif) that will be used to run the command. Apptainer uses .sif files as container images.
  5. code/02.1.sh: This is the script that will be executed inside the container. The script (02.1.sh) is located in the code/ directory, relative to the current working directory.

Summary

The command runs the 02.1.sh script within a specific Apptainer container (srlab-bioinformatics-container-586bf21.sif), with the current directory set as the home and specific directories bound to the container for access. This setup allows the script to interact with files and directories on the host system while running in a controlled container environment.