arq5x / lumpy-sv

lumpy: a general probabilistic framework for structural variant discovery
MIT License
309 stars 118 forks source link

lumpyexpress bedpe parameter #260

Open fpbarthel opened 6 years ago

fpbarthel commented 6 years ago

What are the usage instructions for the -bedpe parameter in lumpyexpress? This is unfortunately not clear from the examples. Is this the same as for the -B, -S and -D parameters? Eg.

lumpyexpress \
    -B tumor.bam,normal.bam \
    -S tumor.splitters.bam,normal.splitters.bam \
    -D tumor.discordants.bam,normal.discordants.bam \
    -bedpe tumor.cnvnator.bedpe,normal.cnvnator.bedpe \
    -o tumor_normal.vcf

The lumpyexpress --help output suggests you also need to provide sample IDs, but this seems redundant if they are also not provided for at -B, -S and -D?

usage:   lumpyexpress [options]

options:
     -d FILE  bedpe file of depths (comma separated and prefixed by sample:)
              e.g sample_x:/path/to/sample_x.bedpe,sample_y:/path/to/sample_y.bedpe
fpbarthel commented 6 years ago

Another question on this, I am using cnvanator_to_bedpes.py (link, is the filename misspelled?) to generate the input BEDPE files for this parameter, however this generates two BEDPE files per sample, eg. tumor.del.bedpe and tumor.dup.bedpe. Should the deletions and duplications from a single sample be merged into one bedpe here?

Also, for the --breakpoint_size parameter supplied to this python script, should we use the same bin size as was used with CNVnator?

fpbarthel commented 6 years ago

Bump this thread? @ryanlayer ? (hope you don't mind the tag)

I get the Error: must specify depths as sample_id:bedpe even when I specify samples in the given format. I am using the same sample_id as in the BAM header in the SM tag.

Eg.:

barthf$ samtools view -H tumor.bam | grep '^@RG' | sed "s/.*SM:\([^\t]*\).*/\1/g" | uniq
TUMOR-SAMPLE-SM-TAG

barthf$ samtools view -H normal.bam | grep '^@RG' | sed "s/.*SM:\([^\t]*\).*/\1/g" | uniq
NORMAL-SAMPLE-SM-TAG

barthf$ lumpyexpress \
    -B tumor.bam,normal.bam \
    -S tumor.splitters.bam,normal.splitters.bam \
    -D tumor.discordants.bam,normal.discordants.bam \
    -bedpe TUMOR-SAMPLE-SM-TAG:tumor.cnvnator.bedpe,NORMAL-SAMPLE-SM-TAG:normal.cnvnator.bedpe \
    -o tumor_normal.vcf

....
Error: must specify depths as sample_id:bedpe

Here's my questions:

  1. Why is there an error message when I am supplying the parameter as suggested? turns out i was using lumpy -bedpe instead of lumpyexpress's -d which causes the problem, it gives an error related to bedpe rather than eg. "unknown parameter: -bedpe" so I didn't realize it until now
  2. Why do we need to supply sample ID ? Is this because BEDPE do not have header?
  3. Should the DUP and DEL BEDPE files output from cnvanator_to_bedpes.py be merged?
  4. What to use as --breakpoint_size for cnvanator_to_bedpes.py ?