bioinform / metasv

MetaSV: An accurate and integrative structural-variant caller for next generation sequencing
http://bioinform.github.io/metasv/
BSD 2-Clause "Simplified" License
54 stars 21 forks source link

Pip installation error #132

Open moldach opened 4 years ago

moldach commented 4 years ago

I'm trying to set up metaSV on a shared HPC on ComputeCanada's Cedar and running into an error with the pip installation.

Following the installation instructions I download/load the system requirements first.

First load provided modules and setup Python env:

module load python/3.8
module load spades/3.13.1
module load samtools/0.1.20

virtualenv metaSV
source metaSV/bin/activate
pip install Cython # needs to be installed before the following 3 dependencies
pip install pysam
pip install pybedtools
pip install pyvcf

SPAdes was already available but I needed to downloaded/compiled AGE make OMP=no

Now I try to install metaSV with pip install https://github.com/bioinform/metasv/archive/0.5.2.tar.gz and get an error:

Ignoring pip: markers 'python_version < "3"' don't match your environment
Looking in links: /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/avx2, /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/generic
Collecting https://github.com/bioinform/metasv/archive/0.5.2.tar.gz
  Using cached https://github.com/bioinform/metasv/archive/0.5.2.tar.gz
    ERROR: Command errored out with exit status 1:
     command: /project/6013424/common/tools/CNV/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-pd08her8/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-pd08her8/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-req-build-pd08her8/pip-egg-info
         cwd: /tmp/pip-req-build-pd08her8/
    Complete output (6 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-req-build-pd08her8/setup.py", line 8
        print version
              ^
    SyntaxError: Missing parentheses in call to 'print'. Did you mean print(version)?
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

My uname -a:

Linux cedar1.cedar.computecanada.ca 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019 x86_64 GNU/Linux
msahraeian commented 4 years ago

@moldach you need to use python 2. Currently you are loading python 3.

moldach commented 4 years ago

Hi @msahraeian

Using python2 instead of python3 helped with pip install but I get an error trying to run metaSV on my reference genome and bam file.

module load python/2.7.14
module load nixpkgs/16.09
module load gcc/5.4.0  # can you use the more updated one with spades? I think not
module load spades/3.13.1
module load bedtools/2.27.1
module load cnvnator/0.3.3 
module load pindel/0.2.5b8

source metaSV/bin/activate

run_metasv.py --reference ~/projects/def-mtarailo/common/indexes/WS265_wormbase/c_elegans.PRJNA13758.WS265.genomic.fa --breakdancer_native breakdancer.out --breakseq_native breakseq.gff --cnvnator_native cnvnator.call --pindel_native pindel_D pindel_LI pindel_SI pindel_TD pindel_INV --sample HG005 --bam /scratch/moldach/TEST/BC1217_trim_bwaMEM_sort_dedupped.bam  --spades spades.py --age age_align --num_threads 1 --workdir work --outdir out --max_ins_intervals 500000 --isize_mean 500 --isize_sd 150

This is the error I get:

INFO 2020-03-12 12:34:54,906 metasv.main          Running MetaSV 0.5.2
INFO 2020-03-12 12:34:54,907 metasv.main          Command-line /scratch/moldach/2020-                                                                                                    03-12/metaSV/bin/run_metasv.py --reference /home/moldach/projects/def-mtarailo/common                                                                                                    /indexes/WS265_wormbase/c_elegans.PRJNA13758.WS265.genomic.fa --breakdancer_native br                                                                                                    eakdancer.out --breakseq_native breakseq.gff --cnvnator_native cnvnator.call --pindel                                                                                                    _native pindel_D pindel_LI pindel_SI pindel_TD pindel_INV --sample HG005 --bam /scrat                                                                                                    ch/moldach/TEST/BC1217_trim_bwaMEM_sort_dedupped.bam --spades spades.py --age age_ali                                                                                                    gn --num_threads 1 --workdir work --outdir out --max_ins_intervals 500000 --isize_mea                                                                                                    n 500 --isize_sd 150
INFO 2020-03-12 12:34:54,907 metasv.main          Arguments are Namespace(age='age_al                                                                                                    ign', age_timeout=300, age_window=20, assembly_max_tools=1, assembly_pad=500, bams=['                                                                                                    /scratch/moldach/TEST/BC1217_trim_bwaMEM_sort_dedupped.bam'], boost_sc=False, breakda                                                                                                    ncer_native=['breakdancer.out'], breakdancer_vcf=[], breakseq_native=['breakseq.gff']                                                                                                    , breakseq_vcf=[], chromosomes=[], cnvkit_vcf=[], cnvnator_native=['cnvnator.call'],                                                                                                     cnvnator_vcf=[], disable_assembly=False, enable_per_tool_output=False, extraction_max                                                                                                    _read_pairs=10000, filter_gaps=False, gaps=None, gatk_vcf=[], gt_normal_frac=0.05, gt                                                                                                    _window=100, inswiggle=100, isize_mean=500.0, isize_sd=150.0, keep_standard_contigs=F                                                                                                    alse, lumpy_vcf=[], manta_vcf=[], max_ins_cov_frac=1.5, max_ins_intervals=500000, max                                                                                                    _nm=10, maxsvlen=1000000, mean_read_coverage=50, mean_read_length=100, min_avg_base_q                                                                                                    ual=20, min_del_subalign_len=50, min_ins_cov_frac=0.5, min_inv_subalign_len=50, min_m                                                                                                    apq=5, min_matches=50, min_soft_clip=20, min_support_frac_ins=0.05, min_support_ins=1                                                                                                    5, minsvlen=50, num_threads=1, outdir='out', overlap_ratio=0.5, pindel_native=['pinde                                                                                                    l_D', 'pindel_LI', 'pindel_SI', 'pindel_TD', 'pindel_INV'], pindel_vcf=[], reference=                                                                                                    '/home/moldach/projects/def-mtarailo/common/indexes/WS265_wormbase/c_elegans.PRJNA137                                                                                                    58.WS265.genomic.fa', sample='HG005', sc_other_scale=5, spades='spades.py', spades_ma                                                                                                    x_interval_size=50000, spades_options='', spades_timeout=300, stop_spades_on_fail=Fal                                                                                                    se, svs_to_assemble=set(['DUP', 'INV', 'DEL', 'INS']), svs_to_report=set(['INV', 'CTX                                                                                                    ', 'INS', 'DEL', 'ITX', 'DUP']), svs_to_softclip=set(['DUP', 'INV', 'DEL', 'INS']), w                                                                                                    ham_vcf=[], wiggle=100, workdir='work')
INFO 2020-03-12 12:34:55,056 metasv.main          Only SVs on the following contigs w                                                                                                    ill be reported: ['I', 'II', 'III', 'IV', 'MtDNA', 'V', 'X']
INFO 2020-03-12 12:34:55,057 metasv.main          Load native files
INFO 2020-03-12 12:34:55,057 metasv.cnvnator_reader File is cnvnator.call
Traceback (most recent call last):
  File "/scratch/moldach/2020-03-12/metaSV/bin/run_metasv.py", line 143, in <module>
    sys.exit(run_metasv(args))
  File "/scratch/moldach/2020-03-12/metaSV/lib/python2.7/site-packages/metasv/main.py                                                                                                    ", line 106, in run_metasv
    for record in svReader(native_file, svs_to_report=args.svs_to_report):
  File "/scratch/moldach/2020-03-12/metaSV/lib/python2.7/site-packages/metasv/cnvnato                                                                                                    r_reader.py", line 110, in __init__
    self.file_fd = open(file_name)
IOError: [Errno 2] No such file or directory: 'cnvnator.call'

Removing this flag throws a new error:

INFO 2020-03-12 12:40:54,072 metasv.main          Running MetaSV 0.5.2
INFO 2020-03-12 12:40:54,073 metasv.main          Command-line /scratch/moldach/2020-03-12/metaSV/bin/run_metasv.py --reference /home/moldach/projects/def-mtarailo/common/indexes/WS265_wormbase/c_elegans.PRJNA13758.WS265.genomic.fa --breakdancer_native breakdancer.out --breakseq_native breakseq.gff --pindel_native pindel_D pindel_LI pindel_SI pindel_TD pindel_INV --sample HG005 --bam /scratch/moldach/TEST/BC1217_trim_bwaMEM_sort_dedupped.bam --spades spades.py --age age_align --num_threads 1 --workdir work --outdir out --max_ins_intervals 500000 --isize_mean 500 --isize_sd 150
INFO 2020-03-12 12:40:54,073 metasv.main          Arguments are Namespace(age='age_align', age_timeout=300, age_window=20, assembly_max_tools=1, assembly_pad=500, bams=['/scratch/moldach/TEST/BC1217_trim_bwaMEM_sort_dedupped.bam'], boost_sc=False, breakdancer_native=['breakdancer.out'], breakdancer_vcf=[], breakseq_native=['breakseq.gff'], breakseq_vcf=[], chromosomes=[], cnvkit_vcf=[], cnvnator_native=[], cnvnator_vcf=[], disable_assembly=False, enable_per_tool_output=False, extraction_max_read_pairs=10000, filter_gaps=False, gaps=None, gatk_vcf=[], gt_normal_frac=0.05, gt_window=100, inswiggle=100, isize_mean=500.0, isize_sd=150.0, keep_standard_contigs=False, lumpy_vcf=[], manta_vcf=[], max_ins_cov_frac=1.5, max_ins_intervals=500000, max_nm=10, maxsvlen=1000000, mean_read_coverage=50, mean_read_length=100, min_avg_base_qual=20, min_del_subalign_len=50, min_ins_cov_frac=0.5, min_inv_subalign_len=50, min_mapq=5, min_matches=50, min_soft_clip=20, min_support_frac_ins=0.05, min_support_ins=15, minsvlen=50, num_threads=1, outdir='out', overlap_ratio=0.5, pindel_native=['pindel_D', 'pindel_LI', 'pindel_SI', 'pindel_TD', 'pindel_INV'], pindel_vcf=[], reference='/home/moldach/projects/def-mtarailo/common/indexes/WS265_wormbase/c_elegans.PRJNA13758.WS265.genomic.fa', sample='HG005', sc_other_scale=5, spades='spades.py', spades_max_interval_size=50000, spades_options='', spades_timeout=300, stop_spades_on_fail=False, svs_to_assemble=set(['DUP', 'INV', 'DEL', 'INS']), svs_to_report=set(['INV', 'CTX', 'INS', 'DEL', 'ITX', 'DUP']), svs_to_softclip=set(['DUP', 'INV', 'DEL', 'INS']), wham_vcf=[], wiggle=100, workdir='work')
INFO 2020-03-12 12:40:54,089 metasv.main          Only SVs on the following contigs will be reported: ['I', 'II', 'III', 'IV', 'MtDNA', 'V', 'X']
INFO 2020-03-12 12:40:54,090 metasv.main          Load native files
INFO 2020-03-12 12:40:54,090 metasv.pindel_reader File is pindel_D
Traceback (most recent call last):
  File "/scratch/moldach/2020-03-12/metaSV/bin/run_metasv.py", line 143, in <module>
    sys.exit(run_metasv(args))
  File "/scratch/moldach/2020-03-12/metaSV/lib/python2.7/site-packages/metasv/main.py", line 106, in run_metasv
    for record in svReader(native_file, svs_to_report=args.svs_to_report):
  File "/scratch/moldach/2020-03-12/metaSV/lib/python2.7/site-packages/metasv/pindel_reader.py", line 282, in __init__
    self.file_fd = open(file_name) if file_name is not None else sys.stdin
IOError: [Errno 2] No such file or directory: 'pindel_D'

Remove another flag, etc. a new error?

INFO 2020-03-12 12:42:18,263 metasv.main          Running MetaSV 0.5.2
INFO 2020-03-12 12:42:18,264 metasv.main          Command-line /scratch/moldach/2020-03-12/metaSV/bin/run_metasv.py --reference /home/moldach/projects/def-mtarailo/common/indexes/WS265_wormbase/c_elegans.PRJNA13758.WS265.genomic.fa --breakdancer_native breakdancer.out --breakseq_native breakseq.gff --sample HG005 --bam /scratch/moldach/TEST/BC1217_trim_bwaMEM_sort_dedupped.bam --spades spades.py --age age_align --num_threads 1 --workdir work --outdir out --max_ins_intervals 500000 --isize_mean 500 --isize_sd 150
INFO 2020-03-12 12:42:18,264 metasv.main          Arguments are Namespace(age='age_align', age_timeout=300, age_window=20, assembly_max_tools=1, assembly_pad=500, bams=['/scratch/moldach/TEST/BC1217_trim_bwaMEM_sort_dedupped.bam'], boost_sc=False, breakdancer_native=['breakdancer.out'], breakdancer_vcf=[], breakseq_native=['breakseq.gff'], breakseq_vcf=[], chromosomes=[], cnvkit_vcf=[], cnvnator_native=[], cnvnator_vcf=[], disable_assembly=False, enable_per_tool_output=False, extraction_max_read_pairs=10000, filter_gaps=False, gaps=None, gatk_vcf=[], gt_normal_frac=0.05, gt_window=100, inswiggle=100, isize_mean=500.0, isize_sd=150.0, keep_standard_contigs=False, lumpy_vcf=[], manta_vcf=[], max_ins_cov_frac=1.5, max_ins_intervals=500000, max_nm=10, maxsvlen=1000000, mean_read_coverage=50, mean_read_length=100, min_avg_base_qual=20, min_del_subalign_len=50, min_ins_cov_frac=0.5, min_inv_subalign_len=50, min_mapq=5, min_matches=50, min_soft_clip=20, min_support_frac_ins=0.05, min_support_ins=15, minsvlen=50, num_threads=1, outdir='out', overlap_ratio=0.5, pindel_native=[], pindel_vcf=[], reference='/home/moldach/projects/def-mtarailo/common/indexes/WS265_wormbase/c_elegans.PRJNA13758.WS265.genomic.fa', sample='HG005', sc_other_scale=5, spades='spades.py', spades_max_interval_size=50000, spades_options='', spades_timeout=300, stop_spades_on_fail=False, svs_to_assemble=set(['DUP', 'INV', 'DEL', 'INS']), svs_to_report=set(['INV', 'CTX', 'INS', 'DEL', 'ITX', 'DUP']), svs_to_softclip=set(['DUP', 'INV', 'DEL', 'INS']), wham_vcf=[], wiggle=100, workdir='work')
INFO 2020-03-12 12:42:18,331 metasv.main          Only SVs on the following contigs will be reported: ['I', 'II', 'III', 'IV', 'MtDNA', 'V', 'X']
INFO 2020-03-12 12:42:18,331 metasv.main          Load native files
INFO 2020-03-12 12:42:18,332 metasv.breakseq_reader File is breakseq.gff
Traceback (most recent call last):
  File "/scratch/moldach/2020-03-12/metaSV/bin/run_metasv.py", line 143, in <module>
    sys.exit(run_metasv(args))
  File "/scratch/moldach/2020-03-12/metaSV/lib/python2.7/site-packages/metasv/main.py", line 106, in run_metasv
    for record in svReader(native_file, svs_to_report=args.svs_to_report):
  File "/scratch/moldach/2020-03-12/metaSV/lib/python2.7/site-packages/metasv/breakseq_reader.py", line 85, in __init__
    self.file_fd = open(file_name) if file_name is not None else sys.stdin
IOError: [Errno 2] No such file or directory: 'breakseq.gff'
msahraeian commented 4 years ago

Hi @moldach, MetaSV is an integrative structural-variant caller, so to run it you first need to prepare the outputs of other individual callers like BreakSeq, BreakDancer, CNVNator, and Pindel. Then you can run MetaSV with the outputs from those callers.