CCBR / TRANQUIL

TRANQUIL (TRna AbundaNce QUantification pIpeLine)
MIT License
0 stars 1 forks source link

mimseq KeyError: 'Ζ' not in modification table #1

Closed kopardev closed 11 months ago

kopardev commented 1 year ago
image
kopardev commented 1 year ago

This is related to ccbr1200

kelly-sovacool commented 1 year ago

contents of slurm-27820288.out

[+] Unloading singularity 3.7.2
[+] Loading singularity 3.7.2
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cluster nodes: 500
Job counts:
    count   jobs
    1   all
    1   mimseq
    2

[Thu Oct 19 15:40:34 2023]
rule mimseq:
    input: /mnt/gridftp/wuc11/Root/ThumpD1_SH5Y_tRNAseq/results/fastqs/SH5Y_WT_1.trim.R1.fastq.gz, /mnt/gridftp/wuc11/Root/ThumpD1_SH5Y_tRNAseq/results/fastqs/SH5Y_WT_2.trim.R1.fastq.gz, /mnt/gridftp/wuc11/Root/ThumpD1_SH5Y_tRNAseq/results/fastqs/SH5Y_KO_1.trim.R1.fastq.gz, /mnt/gridftp/wuc11/Root/ThumpD1_SH5Y_tRNAseq/results/fastqs/SH5Y_KO_2.trim.R1.fastq.gz
    output: /mnt/gridftp/wuc11/Root/ThumpD1_SH5Y_tRNAseq/resources/SH5Y_KO_vs_SH5Y_WT/mimseq/CCAanalysis/CCAcounts.csv
    jobid: 6
    wildcards: contrast=SH5Y_KO_vs_SH5Y_WT
    threads: 16

set -e -x -o pipefail
# set tmpdir
if [ -w "/lscratch/${SLURM_JOB_ID}" ];then 
    # if running on BIOWULF
    tmpdir="/lscratch/${SLURM_JOB_ID}"
    cleanup=0
elif [ -w "/scratch/cluster_scratch/${USER}" ];then
    # if running on FRCE
    tmp="/scratch/cluster_scratch/${USER}"
    tmpdir=(mktemp -d -p $tmp)
    cleanup=1
else
    # Catchall for "other" HPCs
    tmpdir=$(mktemp -d -p /dev/shm)
    cleanup=1
fi

g2=$(echo SH5Y_KO_vs_SH5Y_WT | awk -F"_vs_" '{print $2}')
mimseq  \
--species Hsap  \
--cluster-id 0.95  \
--threads 16  \
--min-cov 0.0005  \
--max-mismatches 0.1  \
--control-condition $g2  \
-n SH5Y_KO_vs_SH5Y_WT  \
--out-dir /mnt/gridftp/wuc11/Root/ThumpD1_SH5Y_tRNAseq/results/SH5Y_KO_vs_SH5Y_WT/mimseq \
--max-multi 4 \
--remap  --remap-mismatches 0.075 \
/mnt/gridftp/wuc11/Root/ThumpD1_SH5Y_tRNAseq/results/SH5Y_KO_vs_SH5Y_WT/sampleinfo.txt

# cleanup tmpdir
if [ "$cleanup" == "1" ];then
    rm -rf $tmpdir
fi

Submitted job 6 with external jobid 'Submitted batch job 27820289'.
[Thu Oct 19 15:41:04 2023]
Error in rule mimseq:
    jobid: 6
    output: /mnt/gridftp/wuc11/Root/ThumpD1_SH5Y_tRNAseq/resources/SH5Y_KO_vs_SH5Y_WT/mimseq/CCAanalysis/CCAcounts.csv
    shell:

set -e -x -o pipefail
# set tmpdir
if [ -w "/lscratch/${SLURM_JOB_ID}" ];then 
    # if running on BIOWULF
    tmpdir="/lscratch/${SLURM_JOB_ID}"
    cleanup=0
elif [ -w "/scratch/cluster_scratch/${USER}" ];then
    # if running on FRCE
    tmp="/scratch/cluster_scratch/${USER}"
    tmpdir=(mktemp -d -p $tmp)
    cleanup=1
else
    # Catchall for "other" HPCs
    tmpdir=$(mktemp -d -p /dev/shm)
    cleanup=1
fi

g2=$(echo SH5Y_KO_vs_SH5Y_WT | awk -F"_vs_" '{print $2}')
mimseq  \
--species Hsap  \
--cluster-id 0.95  \
--threads 16  \
--min-cov 0.0005  \
--max-mismatches 0.1  \
--control-condition $g2  \
-n SH5Y_KO_vs_SH5Y_WT  \
--out-dir /mnt/gridftp/wuc11/Root/ThumpD1_SH5Y_tRNAseq/results/SH5Y_KO_vs_SH5Y_WT/mimseq \
--max-multi 4 \
--remap  --remap-mismatches 0.075 \
/mnt/gridftp/wuc11/Root/ThumpD1_SH5Y_tRNAseq/results/SH5Y_KO_vs_SH5Y_WT/sampleinfo.txt

# cleanup tmpdir
if [ "$cleanup" == "1" ];then
    rm -rf $tmpdir
fi

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: Submitted batch job 27820289

Error executing rule mimseq on cluster (jobid: 6, external: Submitted batch job 27820289, jobscript: /mnt/gridftp/wuc11/Root/ThumpD1_SH5Y_tRNAseq/.snakemake/tmp.87g90a7d/snakejob.mimseq.6.sh). For error details see the cluster log and the log files of the involved rule(s).
Job failed, going on with independent jobs.
Exiting because a job execution failed. Look above for error message
Complete log: /mnt/gridftp/wuc11/Root/ThumpD1_SH5Y_tRNAseq/.snakemake/log/2023-10-19T154031.853574.snakemake.log
####################################################################################################
# Checking Sample Manifest...
#   Total Replicates in manifest : 4
#   Total Samples in manifest : 2
# Checking read access to raw fastqs...
# Read access to all raw fastqs is confirmed!
# Symlinks to all raw fastqs is created!
####################################################################################################
SH5Y_KO
2
SH5Y_WT
2
# Pipeline Parameters:
####################################################################################################
# Working dir : /mnt/gridftp/wuc11/Root/ThumpD1_SH5Y_tRNAseq
# Results dir : /mnt/gridftp/wuc11/Root/ThumpD1_SH5Y_tRNAseq/results
# Scripts dir : /mnt/gridftp/wuc11/Root/ThumpD1_SH5Y_tRNAseq/scripts
# Resources dir : /mnt/gridftp/wuc11/Root/ThumpD1_SH5Y_tRNAseq/resources
# Cluster JSON : /mnt/gridftp/wuc11/Root/ThumpD1_SH5Y_tRNAseq/resources/cluster.json
Building DAG of jobs...
Creating report...
Missing metadata for file /mnt/gridftp/wuc11/Root/ThumpD1_SH5Y_tRNAseq/resources/SH5Y_KO_vs_SH5Y_WT/mimseq/CCAanalysis/CCAcounts.csv. Maybe metadata was deleted or it was created using an older version of Snakemake. This is a non critical warning.
Downloading resources and rendering HTML.
Report created: /mnt/gridftp/wuc11/Root/ThumpD1_SH5Y_tRNAseq/runslurm_snakemake_report.html.
####################################################################################################
# Checking Sample Manifest...
#   Total Replicates in manifest : 4
#   Total Samples in manifest : 2
# Checking read access to raw fastqs...
# Read access to all raw fastqs is confirmed!
# Symlinks to all raw fastqs is created!
####################################################################################################
SH5Y_KO
2
SH5Y_WT
2
# Pipeline Parameters:
####################################################################################################
# Working dir : /mnt/gridftp/wuc11/Root/ThumpD1_SH5Y_tRNAseq
# Results dir : /mnt/gridftp/wuc11/Root/ThumpD1_SH5Y_tRNAseq/results
# Scripts dir : /mnt/gridftp/wuc11/Root/ThumpD1_SH5Y_tRNAseq/scripts
# Resources dir : /mnt/gridftp/wuc11/Root/ThumpD1_SH5Y_tRNAseq/resources
# Cluster JSON : /mnt/gridftp/wuc11/Root/ThumpD1_SH5Y_tRNAseq/resources/cluster.json
kelly-sovacool commented 1 year ago

Error from mimseq rule:

2023-10-23 16:16:46,248 [INFO ] mim-tRNAseq v1.1.7 run with command:
2023-10-23 16:16:46,249 [INFO ] /opt2/conda/envs/mimseq/bin/mimseq --species Hsap --cluster-id 0.95 --threads 16 --min-cov 0.0005 --max-mismatches 0.1
 --control-condition WTtRNA -n Tet1n2DKOalkBtRNA_vs_WTtRNA --out-dir /mnt/gridftp/wuc11/Root/test/results/Tet1n2DKOalkBtRNA_vs_WTtRNA/mimseq --max-mul
ti 4 --remap --remap-mismatches 0.075 /mnt/gridftp/wuc11/Root/test/results/Tet1n2DKOalkBtRNA_vs_WTtRNA/sampleinfo.txt
2023-10-23 16:16:46,253 [INFO ]
+----------------------------------------------+
| Starting analysis for hg38-tRNAs-filtered.fa |
+----------------------------------------------+
2023-10-23 16:16:46,253 [INFO ] Processing tRNA sequences...
2023-10-23 16:16:46,262 [INFO ] 34 introns registered...
2023-10-23 16:16:46,266 [INFO ] 22 mito tRNA sequences imported
2023-10-23 16:16:46,266 [INFO ] 600 cytosolic tRNA sequences imported
2023-10-23 16:16:46,266 [INFO ] Processing modomics database...
2023-10-23 16:16:49,411 [INFO ] Modomics retrieved...
2023-10-23 16:16:49,412 [INFO ] Parsing Modomics JSON data...
Traceback (most recent call last):
  File "/opt2/conda/envs/mimseq/bin/mimseq", line 10, in <module>
    sys.exit(main())
  File "/opt2/conda/envs/mimseq/lib/python3.7/site-packages/mimseq/mimseq.py", line 410, in main
    args.misinc_thresh, args.mito, args.pretrnas, args.local_mod, args.p_adj, args.sampledata)
  File "/opt2/conda/envs/mimseq/lib/python3.7/site-packages/mimseq/mimseq.py", line 101, in mimseq
    = modsToSNPIndex(trnas, trnaout, mito_trnas, modifications, name, out, double_cca, threads, snp_tolerance, cluster, cluster_id, posttrans, pretrna
s, local_mod)
  File "/opt2/conda/envs/mimseq/lib/python3.7/site-packages/mimseq/tRNAtools.py", line 251, in modsToSNPIndex
    tRNA_dict, modomics_dict, species = tRNAparser(gtRNAdb, tRNAscan_out, mitotRNAs, modifications_table, posttrans_mod_off, double_cca, pretrnas, loc
al_mod)
  File "/opt2/conda/envs/mimseq/lib/python3.7/site-packages/mimseq/tRNAtools.py", line 101, in tRNAparser
    modomics_dict, perSpecies_count = processModomics(modomics_file, fetch, species, modifications)
  File "/opt2/conda/envs/mimseq/lib/python3.7/site-packages/mimseq/tRNAtools.py", line 145, in processModomics
    unmod_sequence = getUnmodSeq(sequence, modifications)
  File "/opt2/conda/envs/mimseq/lib/python3.7/site-packages/mimseq/tRNAtools.py", line 928, in getUnmodSeq
    char = modification_table[char]['ref']
KeyError: 'Ζ'
kelly-sovacool commented 1 year ago

I was able to replicate this exact error with the test dataset

bash /mnt/projects/CCBR-Pipelines/pipelines/TRANQUIL/tranquil -m=run -w=/home/sovacoolkl/scratch/tranquil_test
kelly-sovacool commented 1 year ago

Will try using mimseq v1.3.7 and see if the error persists

https://github.com/CCBR/Dockers/pull/23

Update: the error persists with v1.3.7 installed from bioconda and run outside tranquil.

mamba create -n mimseq -c bioconda python=3.7 mimseq=1.3.7
mamba activate mimseq
mimseq  --species Hsap  --cluster-id 0.95  --threads 16  --min-cov 0.0005  --max-mismatches 0.1  --control-condition WTtRNA  -n Tet1n2DKOalkBtRNA_vs_WTtRNA  --out-dir /scratch/cluster_scratch/sovacoolkl/tranquil_test/results/Tet1n2DKOalkBtRNA_vs_WTtRNA/mimseq --max-multi 4 --remap  --remap-mismatches 0.075 /scratch/cluster_scratch/sovacoolkl/tranquil_test/results/Tet1n2DKOalkBtRNA_vs_WTtRNA/sampleinfo.txt
kelly-sovacool commented 1 year ago

This is strange. The modifications table does have a Z key.

def modificationParser(modifications_table):
    # Read in modifications and build dictionary
        mods = open(modifications_table, 'r', encoding='utf-8')
        modifications = {}
        for line in mods:
            if not line.startswith("#"):
                name, abbr, ref, mod = line.split('\t')
                # replace unknown modifications with reference of N
                if not ref or ref.isspace():
                    ref = 'N'
                if mod and not mod.isspace():
                    modifications[mod.strip()] = {'name':name.strip(), 'abbr':abbr.strip(), 'ref':ref.strip()}
        return(modifications)

modifications_table = 'mimseq/modifications'
mods = modificationParser(modifications_table)
mods['Z']
{'name': '2′-O-methylpseudouridine', 'abbr': 'Ym', 'ref': 'U'}
kelly-sovacool commented 1 year ago

Update: this also happens on biowulf with mimseq installed via miniconda

kelly-sovacool commented 1 year ago

It turns out this is a known issue: https://github.com/nedialkova-lab/mim-tRNAseq/issues/45

kelly-sovacool commented 1 year ago

The local modomics flag can't be used in singularity containers because mimseq opens the file in read+write mode. Proposed a fix here: https://www.github.com/nedialkova-lab/mim-tRNAseq/pull/50

kelly-sovacool commented 1 year ago

latest error

Traceback (most recent call last):
  File "/usr/local/bin/mimseq", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/mimseq/mimseq.py", line 411, in main
    args.misinc_thresh, args.mito, args.plastid, args.pretrnas, args.local_mod, args.p_adj, args.sampledata)
  File "/usr/local/lib/python3.7/site-packages/mimseq/mimseq.py", line 101, in mimseq
    = modsToSNPIndex(trnas, trnaout, mito_trnas, plastid_trnas, modifications, name, out, double_cca, threads, snp_tolerance,
cluster, cluster_id, posttrans, pretrnas, local_mod)
  File "/usr/local/lib/python3.7/site-packages/mimseq/tRNAtools.py", line 272, in modsToSNPIndex
    aligntRNA(tempSeqs.name, out_dir, threads)
  File "/usr/local/lib/python3.7/site-packages/mimseq/ssAlign.py", line 27, in aligntRNA
    subprocess.check_call(cmcommand, stdout = open(out + 'cm.log', 'w'))
  File "/usr/local/lib/python3.7/subprocess.py", line 358, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/usr/local/lib/python3.7/subprocess.py", line 339, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/usr/local/lib/python3.7/subprocess.py", line 800, in __init__
    restore_signals, start_new_session)
  File "/usr/local/lib/python3.7/subprocess.py", line 1567, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'cmalign': 'cmalign'