bergmanlab / mcclintock

Meta-pipeline to identify transposable element insertions using next generation sequencing data
93 stars 30 forks source link

error after running testing dataset. #89

Closed yanhaidong1 closed 2 years ago

yanhaidong1 commented 2 years ago

Hi, when I ran the testing data, it shows the following error. after mapping, there is no 4322700.tmp.bam. Would you help figure it out. thanks!

PROCESSING       prepping reads for McClintock
PROCESSING       read setup complete
PROCESSING       making consensus fasta
PROCESSING       consensus fasta created
PROCESSING       making reference fasta
PROCESSING       reference fasta created
PROCESSING       making reference TE annotations
PROCESSING       no reference TEs provided... finding reference TEs with RepeatMasker &> /scratch/hy17471/Alex_282_TE_011922/test_running_mcclintock_021322/working_dir/logs/20220214.083547.4322700/processing.log
PROCESSING       reference TE annotations created
PROCESSING       making reference TE bed file
PROCESSING       reference TE bed file created
PROCESSING       making reference TE fasta &> /scratch/hy17471/Alex_282_TE_011922/test_running_mcclintock_021322/working_dir/logs/20220214.083547.4322700/processing.log
PROCESSING       reference TE fasta created
PROCESSING       making popoolationTE annotation files
PROCESSING       popoolationTE annotation files created
PROCESSING       masking reference fasta &> /scratch/hy17471/Alex_282_TE_011922/test_running_mcclintock_021322/working_dir/logs/20220214.083547.4322700/processing.log
PROCESSING       reference fasta masked
PROCESSING       making PopoolationTE reference fasta
PROCESSING       PopoolationTE reference fasta created
Failed to solve scheduling problem with ILP solver. Falling back to greedy solver.Run Snakemake with --verbose to see the full solver output for debugging the problem.
PROCESSING       making samtools and bwa index files for reference fasta &> /scratch/hy17471/Alex_282_TE_011922/test_running_mcclintock_021322/working_dir/logs/20220214.083547.4322700/processing.log
PROCESSING       samtools and bwa index files for reference fasta created
POPOOLATIONTE2   setting up for PopoolationTE2
POPOOLATIONTE2   indexing reference fasta &> /scratch/hy17471/Alex_282_TE_011922/test_running_mcclintock_021322/working_dir/logs/20220214.083547.4322700/popoolationTE2.log
POPOOLATIONTE2   formatting fastq read names &> /scratch/hy17471/Alex_282_TE_011922/test_running_mcclintock_021322/working_dir/logs/20220214.083547.4322700/popoolationTE2.log
POPOOLATIONTE2   formatting fastq read names &> /scratch/hy17471/Alex_282_TE_011922/test_running_mcclintock_021322/working_dir/logs/20220214.083547.4322700/popoolationTE2.log
POPOOLATIONTE2   mapping reads &> /scratch/hy17471/Alex_282_TE_011922/test_running_mcclintock_021322/working_dir/logs/20220214.083547.4322700/popoolationTE2.log
POPOOLATIONTE2   mapping reads &> /scratch/hy17471/Alex_282_TE_011922/test_running_mcclintock_021322/working_dir/logs/20220214.083547.4322700/popoolationTE2.log
POPOOLATIONTE2   converting SAM to BAM &> /scratch/hy17471/Alex_282_TE_011922/test_running_mcclintock_021322/working_dir/logs/20220214.083547.4322700/popoolationTE2.log
POPOOLATIONTE2   PopoolationTE2 preprocessing complete
PROCESSING       mapping reads to reference &> /scratch/hy17471/Alex_282_TE_011922/test_running_mcclintock_021322/working_dir/logs/20220214.083547.4322700/bwa.log
PROCESSING       read mapping complete
samtools sort -@ 4 /scratch/hy17471/Alex_282_TE_011922/test_running_mcclintock_021322/working_dir/sacCer2/4322700.tmp.bam /scratch/hy17471/Alex_282_TE_011922/test_running_mcclintock_021322/working_dir/sacCer2/4322700.tmp2
Traceback (most recent call last):
  File "/scratch/hy17471/Alex_282_TE_011922/test_running_mcclintock_021322/working_dir/snakemake/4322700/.snakemake/scripts/tmpplo93l6n.sam_to_bam.py", line 38, in main
    mccutils.check_file_exists(snakemake.output.tmp2_bam)
  File "/home/hy17471/software/mcclintock/scripts/mccutils.py", line 226, in check_file_exists
    raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), infile)
FileNotFoundError: [Errno 2] No such file or directory: '/scratch/hy17471/Alex_282_TE_011922/test_running_mcclintock_021322/working_dir/sacCer2/4322700.tmp2.bam'
cbergman commented 2 years ago

Hi @yanhaidong1. Thanks for submitting this issue. Can you tell us if you ran the same exact command as in the getting started section of the readme (https://github.com/bergmanlab/mcclintock/#started), or did you invoke mcclintock.py just to run PopoolationTE2 on its own? If you ran a different command than in the getting started section of the readme, can you please post that command here? Also, can you tell us if you ran your mcclintock.py interactively or inside a script submitted to a cluster? Thanks!

yanhaidong1 commented 2 years ago

Dear Dr. Bergman,

Thank you so much for replying. I ran following the tutorial in the readme but used '-m popoolationte2,retroseq'. Here is what the code in python I used:

source activate /home/hy17471/.conda/envs/mcclintock021022 cmd = 'python3 ' + '/home/hy17471/software/mcclintock/mcclintock.py' + \ ' -r ' + '/scratch/hy17471/Alex_282_TE_011922/test_running_mcclintock_021322/test/sacCer2.fasta' + \ ' -c ' +'/scratch/hy17471/Alex_282_TE_011922/test_running_mcclintock_021322/test/sac_cer_TE_seqs.fasta' + \ ' -1 ' + '/scratch/hy17471/Alex_282_TE_011922/test_running_mcclintock_021322/test/fastq_dir' + '/SRR800842_1.fastq.gz' + \ ' -2 ' + '/scratch/hy17471/Alex_282_TE_011922/test_running_mcclintock_021322/test/fastq_dir' + '/SRR800842_2.fastq.gz' + \ ' -p ' + input_core_num + \ ' -o ' + working_dir + \ ' -m ' + m_str

Also, I ran inside a script submitted to a cluster.

Best wishes, Haidong

cbergman commented 2 years ago
yanhaidong1 commented 2 years ago

Dear Dr. bergman

Sounds good, thanks!

Best wishes Haidong

cbergman commented 2 years ago

log out and log back in

conda update -y conda conda install -y mamba=0.21.2 -c conda-forge


- install mcclintock

git clone git@github.com:bergmanlab/mcclintock.git cd mcclintock mamba env create -f install/envs/mcclintock.yml --name mcclintock conda activate mcclintock python3 mcclintock.py --install


- download test data

python3 test/download_test_data.py


- run mcclintock on test data using only popoolationte2 & retroseq. Note: this code was run inside a bash script submitted to a cluster running slurm 21.08.5. You'll need to run this directly on a worker node or modify the header appropriately.

!/bin/bash

SBATCH --job-name=mcc2_test

SBATCH --partition=xxxxx

SBATCH --ntasks=1

SBATCH --cpus-per-task=28

SBATCH --mem=240gb

SBATCH --time=10:00:00

SBATCH --output=/scratch/xxxx/mcc2_test.log

SBATCH --mail-user=xxxx@xxxx.edu

SBATCH --mail-type=END,FAIL

CONDA_BASE=$(conda info --base) source ${CONDA_BASE}/etc/profile.d/conda.sh conda activate mcclintock

python3 mcclintock.py \ -r test/sacCer2.fasta \ -c test/sac_cer_TE_seqs.fasta \ -g test/reference_TE_locations.gff \ -t test/sac_cer_te_families.tsv \ -1 test/SRR800842_1.fastq.gz \ -2 test/SRR800842_2.fastq.gz \ -p 28 \ -o /scratch/cbergman/mcc_test_2 \ -m popoolationte2,retroseq


- Could you please try a fresh install and directly run mcclintock using the commands above (ie. submit your job outside of your python script) to confirm your installation is correct? Thanks!
yanhaidong1 commented 2 years ago

Dear Dr. bergman

Thank you so much for the detailed information. I will have a try.

Best wishes, Haidong

cbergman commented 2 years ago

@yanhaidong1: were you able to get the test date to run on a clean install? If so, I'd like to close this issue. Thanks!