ARTbio / tools-artbio

Collection of galaxy tools developed by the artbio-platform at the IBPS (Institut de Biologie Paris-Seine)
MIT License
12 stars 18 forks source link

lumpy paired end input #100

Closed mvdbeek closed 7 years ago

mvdbeek commented 7 years ago

Hey @drosofff,

have you tried lumpy already with paired-end input Tumour-Normal input? I'm getting:

Fatal error: Exit code 1 ()
[bam_sort_core] merging from 25 files...
[bam_sort_core] merging from 31 files...
Warning: only 0 elements in distribution (min: 1000)
Warning: only 0 elements in distribution (min: 1000)
missing pair end parameters:mean stdev 

Program: ********** (v 0.2.13)
Author:  Ryan Layer (rl6sf@virginia.edu)
Summary: Find structural variations in various signals.

Usage:   ********** [OPTIONS] 

Options: 
    -g  Genome file (defines chromosome order)
    -e  Show evidence for each call
    -w  File read windows size (default 1000000)
    -mw minimum weight for a call
    -msw    minimum per-sample weight for a call
    -tt trim threshold
    -x  exclude file bed file
    -t  temp file prefix, must be to a writeable directory
    -P  output probability curve for each variant
    -b  output BEDPE instead of VCF
    -sr bam_file:<file name>,
        id:<sample name>,
        back_distance:<distance>,
        min_mapping_threshold:<mapping quality>,
        weight:<sample weight>,
        min_clip:<minimum clip length>,
        read_group:<string>

    -pe bam_file:<file name>,
        id:<sample name>,
        histo_file:<file name>,
        mean:<value>,
        stdev:<value>,
        read_length:<length>,
        min_non_overlap:<length>,
        discordant_z:<z value>,
        back_distance:<distance>,
        min_mapping_threshold:<mapping quality>,
        weight:<sample weight>,
        read_group:<string>

    -bedpe  bedpe_file:<bedpe file>,
        id:<sample name>,
        weight:<sample weight>

These were my input parameters (mostly standard, except for read length):


Input Parameter Value   Note for rerun
Input(s)    two_sample  
One BAM alignment file produced by BWA-mem  558: MarkDuplicates on data 513: MarkDuplicates BAM output  
read length 101 
One BAM alignment file produced by BWA-mem  560: MarkDuplicates on data 514: MarkDuplicates BAM output  
read length 101 
Sequencing method   paired-end  
additional_params   
number of reads to compute mean and stdev of read length    1000000 
-mw 4   
-tt 0   
min_non_overlap 101 
discordant_z    5   
back_distance   10  
weight  1   
min_mapping_threshold   20  

I'll have a look if I can figure out what's wrong

drosofff commented 7 years ago

Yes, I developed the wrapper to mine paired-end datasets, indeed, and it works. The current test-data is a nasty trick to turn around the large size of Bam needed to avoid the type of error you report !

Regarding your error, it sounds familiar to me: I got it when I was trying to downsample a paired-end dataset for the tool test. I would say the crash comes from the python script that computes the mean and stdev of fragment length in the paired-end library. It may comes from the library itself. You'll see it if you do the analysis with command lines. We could make this script optional in the wrapper and implement mean and stdev manually from the tool form. However, I never had the error myself with my true datasets (7 human paired-end). Keep me posted !

mvdbeek commented 7 years ago

I got it, it's because you're filtering for readgroup with samtools view -r readgroup. This is not a problem if you don't set the readgroups, but if you do set them it will just not output any reads. I'm opening a PR in a minute.

mvdbeek commented 7 years ago

So the tumor-normal in the problem description was a red herring.

mvdbeek commented 7 years ago

Should be solved by #102