PapenfussLab / StructuralVariantAnnotation

R package designed to simplify structural variant analysis
GNU General Public License v3.0
68 stars 15 forks source link

Preserving Info Fields with breakpointgr2bedpe() #41

Open blex-max opened 1 year ago

blex-max commented 1 year ago

Hello,

Thanks for maintaining StructuralVariantAnnotation! I'm attempting to convert VCFs to .bedpe files using the following code:

vcf <- VariantAnnotation::readVcf(opt$vcf_path)
brs <- StructuralVariantAnnotation::breakpointRanges(
    vcf,
    info_columns = names(vcf@info)
)
bp_df <- StructuralVariantAnnotation::breakpointgr2bedpe(brs)

I was hoping the info fields preserved by breakpointRanges() would also be preserved as additional columns but this is not the case. Could this be implemented as an option, or could you suggest a workaround? Thanks.

d-cameron commented 1 year ago

This issue is that each bedpe record is a merger of two breakpointGRanges rows and those two rows don't necessarily have the same values. For example the partner field.

For simple DEL/DUP/INS events, it's a 1 to 2 to 1 operation so there's no ambiguity, so you could copy the INFO fields from info(vcf) as additional bedpe columns. For other events, you'll need to work out which is the 'correct'^ value.

^ in many cases there is no 'correct' value. For example, a breakpoint can be homozygous on one side and heterozygous on the other side (e.g. chr1<->chr2 breakpoint with a total loss of the other copy of chr2).