bioinform / breakseq2

BreakSeq2: Ultrafast and accurate nucleotide-resolution analysis of structural variants
BSD 2-Clause "Simplified" License
24 stars 5 forks source link

OSError: [Errno 7] Argument list too long #15

Open MaestSi opened 6 years ago

MaestSi commented 6 years ago

Dear Breakseq2 developers, I am trying to run the software through SVE (https://github.com/TheJacksonLaboratory/SVE) wrapper on a BAM file obtained aligning sequence reads to hg38 reference (with alternative haplotypes). However, I get this error:

/mnt/cifs01/simone/software/miniconda2/bin/python /mnt/cifs01/simone/software/SVE/src/breakseq2/scripts/run_breakseq2.py --bwa /mnt/cifs01/simone/software/SVE/src/bwa/bwa --samtools /mnt/cifs01/simone/software/SVE/src/samtools/samtools --reference /home/simone/home_disk/Whole_genome/Homo_sapiens_assembly38.fasta --work /mnt/cifs01/simone/NA12878/Breakseq_output/start_sorted_S35/ --min_span 2 --window 500 --min_overlap 2 --junction_length 1000 --bams /mnt/cifs01/simone/NA12878/start_sorted.bam --nthreads 4 --bplib_gff /mnt/cifs01/simone/software/SVE/data/breakseq_bplib/breakseq2_bplib_20150129_hg38.gff call error: INFO 2018-04-14 09:45:59,648 /mnt/cifs01/simone/software/SVE/src/breakseq2/scripts/run_breakseq2.py Command-line: /mnt/cifs01/simone/software/SVE/src/breakseq2/scripts/run_breakseq2.py --bwa /mnt/cifs01/simone/software/SVE/src/bwa/bwa --samtools /mnt/cifs01/simone/software/SVE/src/samtools/samtools --reference /home/simone/home_disk/Whole_genome/Homo_sapiens_assembly38.fasta --work /mnt/cifs01/simone/NA12878/Breakseq_output/start_sorted_S35/ --min_span 2 --window 500 --min_overlap 2 --junction_length 1000 --bams /mnt/cifs01/simone/NA12878/start_sorted.bam --nthreads 4 --bplib_gff /mnt/cifs01/simone/software/SVE/data/breakseq_bplib/breakseq2_bplib_20150129_hg38.gff

[...] many INFO messages [...]

Traceback (most recent call last): File "/mnt/cifs01/simone/software/SVE/src/breakseq2/scripts/run_breakseq2.py", line 28, in args.keep_temp, args.window, args.junction_length)) File "/mnt/cifs01/simone/software/SVE/src/breakseq2/breakseq2/breakseq_top.py", line 115, in breakseq2_workflow nthreads, keep_temp) File "/mnt/cifs01/simone/software/SVE/src/breakseq2/breakseq2/preprocess_and_align.py", line 90, in parallel_preprocess_and_align subprocess.check_call(bash_cmd, shell=True) File "/mnt/cifs01/simone/software/miniconda2/lib/python2.7/subprocess.py", line 181, in check_call retcode = call(*popenargs, *kwargs) File "/mnt/cifs01/simone/software/miniconda2/lib/python2.7/subprocess.py", line 168, in call return Popen(popenargs, **kwargs).wait() File "/mnt/cifs01/simone/software/miniconda2/lib/python2.7/subprocess.py", line 390, in init errread, errwrite) File "/mnt/cifs01/simone/software/miniconda2/lib/python2.7/subprocess.py", line 1025, in _execute_child raise child_exception OSError: [Errno 7] Argument list too long

Is there an easy way to solve it? Thanks in advance.

marghoob commented 6 years ago

@MaestSi could you please provide the complete log from BreakSeq output?

MaestSi commented 6 years ago

Dear marghoob, I think I solved it filtering out from breakseq2_bplib_20150129_hg38.gff all chromosomes that are not in chr1-chr22, chrX, chrY, chrM. I was using file breakseq2_bplib_20150129_hg38.gff included in SVE, which has been obtained doing a liftover of original hg19-based library http://sv.gersteinlab.org/phase1bkpts/breakseq2_bplib_20150129.gff. Are there any plans for an updated hg38-based breakpoints library? Thanks

marghoob commented 6 years ago

The breakpoints library is maintained by Prof. Gerstein's lab so I am not sure if they have any plans to update the breakpoints library for hg38.

raydai commented 4 years ago

Hi Breakseq2 developers and @MaestSi , I got the same issue as same as the MaestSi had before. I also removed all chromosomes that are not in chr* formats in breakseq2_bplib_20150129_hg38.gff, but I still have the same issue with the error code: "OSError: [Errno 7] Argument list too long" problem. I have been googling any potential solutions for a few days, but nothing works well for me. I am wondering if you could some idea what else I can try for this problem.

Here is some information about the machine I am using in my project:

LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch Distributor ID: CentOS Description: CentOS release 6.5 (Final) Release: 6.5 Codename: Final

BreakSeq2 was installed via Conda install bioconda command with version v2.2. conda install -c bioconda breakseq2

Log information:

INFO 2020-08-24 10:58:42,693 /home/ksu2/bs2_env/bs2/bin/run_breakseq2.py Command-line: /home/ksu2/bs2_env/bs2/bin/run_breakseq2.py --bwa /home/ksu2/bs2_env/bs2/bin/bwa --samtools /home/ksu2/bs2_env/bs2/bin/samtools --bplib breakseq2_bplib_Hg38_20200519.gff --reference Homo_sapiens_assembly38.fasta --bams /lustre/project/hdeng2/WGS4000/BR16268/BR16268.bam --work workBR16268 --sample BR16268
INFO 2020-08-24 10:58:42,694 breakseq2_workflow   Created working directory /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268
INFO 2020-08-24 10:58:42,714 breakseq2_workflow   Index of breakseq2_bplib_Hg38_20200519_chr.gff does not exist. Copying to /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268 to index
INFO 2020-08-24 10:58:42,740 breakseq2_workflow   Indexing /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/bplib.fa using /home/ksu2/bs2_env/bs2/bin/bwa index /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/bplib.fa
INFO 2020-08-24 10:58:42,810 get_reference_contigs Extracting chromosome names from /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/Homo_sapiens_assembly38.fasta.fai
INFO 2020-08-24 10:58:42,843 preprocess_and_align-<Process(PoolWorker-1, started daemon)> Extracting candidate reads from /lustre/project/hdeng2/WGS4000/BR16268/BR16268.bam for chromosome chr1 and aligning against /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/bplib.fa
INFO 2020-08-24 11:04:59,213 print_candidate_reads-<Process(PoolWorker-1, started daemon)> Extracted 295970 reads from BAMs /lustre/project/hdeng2/WGS4000/BR16268/BR16268.bam for chromosome chr1 (376.37 s)
INFO 2020-08-24 11:04:59,214 preprocess_and_align-<Process(PoolWorker-1, started daemon)> Running bash -c "/home/ksu2/bs2_env/bs2/bin/bwa samse /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/bplib.fa <(/home/ksu2/bs2_env/bs2/bin/bwa aln /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/bplib.fa /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/0.fq) /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/0.fq | /home/ksu2/bs2_env/bs2/bin/samtools view -S - -1 -F 4 -bo /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/0.bam"
INFO 2020-08-24 11:05:00,733 preprocess_and_align-<Process(PoolWorker-1, started daemon)> Finished bash -c "/home/ksu2/bs2_env/bs2/bin/bwa samse /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/bplib.fa <(/home/ksu2/bs2_env/bs2/bin/bwa aln /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/bplib.fa /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/0.fq) /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/0.fq | /home/ksu2/bs2_env/bs2/bin/samtools view -S - -1 -F 4 -bo /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/0.bam" (1.518 s)
INFO 2020-08-24 11:05:00,734 preprocess_and_align-<Process(PoolWorker-1, started daemon)> Extracting candidate reads from /lustre/project/hdeng2/WGS4000/BR16268/BR16268.bam for chromosome chr2 and aligning against /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/bplib.fa
INFO 2020-08-24 11:11:11,186 print_candidate_reads-<Process(PoolWorker-1, started daemon)> Extracted 301564 reads from BAMs /lustre/project/hdeng2/WGS4000/BR16268/BR16268.bam for chromosome chr2 (370.452 s)
[.....]
INFO 2020-08-24 12:21:22,305 preprocess_and_align-<Process(PoolWorker-1, started daemon)> Finished bash -c "/home/ksu2/bs2_env/bs2/bin/bwa samse /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/bplib.fa <(/home/ksu2/bs2_env/bs2/bin/bwa aln /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/bplib.fa /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/3365.fq) /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/3365.fq | /home/ksu2/bs2_env/bs2/bin/samtools view -S - -1 -F 4 -bo /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/3365.bam" (0.012584 s)
INFO 2020-08-24 12:21:22,306 preprocess_and_align-<Process(PoolWorker-1, started daemon)> Extracting candidate reads from /lustre/project/hdeng2/WGS4000/BR16268/BR16268.bam for chromosome  and aligning against /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/bplib.fa
INFO 2020-08-24 12:21:22,967 print_candidate_reads-<Process(PoolWorker-1, started daemon)> Extracted 147550 reads from BAMs /lustre/project/hdeng2/WGS4000/BR16268/BR16268.bam for chromosome  (0.660362 s)
INFO 2020-08-24 12:21:22,967 preprocess_and_align-<Process(PoolWorker-1, started daemon)> Running bash -c "/home/ksu2/bs2_env/bs2/bin/bwa samse /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/bplib.fa <(/home/ksu2/bs2_env/bs2/bin/bwa aln /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/bplib.fa /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/3366.fq) /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/3366.fq | /home/ksu2/bs2_env/bs2/bin/samtools view -S - -1 -F 4 -bo /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/3366.bam"
INFO 2020-08-24 12:21:25,405 preprocess_and_align-<Process(PoolWorker-1, started daemon)> Finished bash -c "/home/ksu2/bs2_env/bs2/bin/bwa samse /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/bplib.fa <(/home/ksu2/bs2_env/bs2/bin/bwa aln /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/bplib.fa /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/3366.fq) /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/3366.fq | /home/ksu2/bs2_env/bs2/bin/samtools view -S - -1 -F 4 -bo /lustre/project/hdeng2/ksu2_SVE/data/BreakSeq/workBR16268/3366.bam" (2.43534 s)
Traceback (most recent call last):
  File "/home/ksu2/bs2_env/bs2/bin/run_breakseq2.py", line 28, in <module>
    args.keep_temp, args.window, args.junction_length))
  File "/home/ksu2/bs2_env/bs2/lib/python2.7/site-packages/breakseq2/breakseq_top.py", line 115, in breakseq2_workflow
    nthreads, keep_temp)
  File "/home/ksu2/bs2_env/bs2/lib/python2.7/site-packages/breakseq2/preprocess_and_align.py", line 90, in parallel_preprocess_and_align
    subprocess.check_call(bash_cmd, shell=True)
  File "/home/ksu2/bs2_env/bs2/lib/python2.7/subprocess.py", line 185, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/home/ksu2/bs2_env/bs2/lib/python2.7/subprocess.py", line 172, in call
    return Popen(*popenargs, **kwargs).wait()
  File "/home/ksu2/bs2_env/bs2/lib/python2.7/subprocess.py", line 394, in __init__
    errread, errwrite)
  File "/home/ksu2/bs2_env/bs2/lib/python2.7/subprocess.py", line 1047, in _execute_child
    raise child_exception
OSError: [Errno 7] Argument list too long

Thanks for all your time and help.

MaestSi commented 4 years ago

Hi, I haven't worked with breakseq2 for a while. I don't remember if eventually I was able to run it using the docker container for SVE. You may want to check it out.

SVE_HOME=/path/to/SVE/dir
SVE=$SVE_HOME/bin/sve
MNT_INPUT_DIR=/tools/SVE/in
MNT_OUTPUT_DIR=/tools/SVE/out
REFERENCE=Homo_sapiens_assembly38.fasta
BAM=samplename.bam
GENOME_VER=hg38
WORKING_DIR=/path/to/working/dir
OUTPUT_DIR=$WORKING_DIR/SVE_output

docker run -v $WORKING_DIR:$MNT_INPUT_DIR -v $OUTPUT_DIR:$MNT_OUTPUT_DIR wanpinglee/sve:0.1.0 /tools/SVE/bin/sve call -r $MNT_INPUT_DIR"/"$REFERENCE -g $GENOME_VER \
-a breakseq  $MNT_INPUT_DIR"/"$BAM -o $MNT_OUTPUT_DIR

Simone

raydai commented 4 years ago

Hi Somone, Thanks for your information. I haven't had a chance to use SVE and Docker, but I will take a look if it could work under the Docker environment.

MaestSi commented 4 years ago

Ok, please let me know if you have any updates. Simone