EichlerLab / smrtsv2

Structural variant caller
MIT License
53 stars 6 forks source link

GRCh37 callset release for the data in [Audano et al 2019] #26

Closed ggstatgen closed 5 years ago

ggstatgen commented 5 years ago

Hi Peter

Bit of a long shot (and not a bug in SMRTsv2 so by all means feel free to move this where pertinent) but I was wondering if you had a GRCh37/hg19 version of the callset in your recent paper available (99k SMRTsv/2 calls from the 15 samples).

It appears the GATK liftover has significant problems with the remapping and we're losing too many calls to consider the option feasible.

I've also been thinking of lifting over your bed file here

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/hgsv_sv_discovery/working/20181025_EEE_SV-Pop_1/VariantCalls_EEE_SV-Pop_1/EEE_SV-Pop_1.ALL.sites.20181204.bed.gz

But I don't know if your vcf calls were converted to minimal form before vcf->bed conversion.

Could you tell me if this was the case?

Best wishes,

paudano commented 5 years ago

I have no GRCh37 version of the SV calls. It would not surprise me if many of the SV hotspots were changed between GRCh36 and GRCh37, and so liftovers probably won't work there.

I assume you are losing more deletions in a liftover than you are insertions because insertions fall on one point, and deletions span many bases. You might be able to break deletions into smaller regions and lift those, but the results may not be fully correct if the reference scaffolds changed significantly.

The VCF was generated from the BED after merging and analysis. I don't know if that helps.

You could try lifting the BED with UCSC liftover and see if that gets more of them. Other than that, I don't have much good advice for this.