dnanexus-rnd / GLnexus

Scalable gVCF merging and joint variant calling for population sequencing projects
Apache License 2.0
145 stars 37 forks source link

Move bcf_unpack before bcf1_t.d invocation #179

Closed xunjieli closed 5 years ago

xunjieli commented 5 years ago

bcf1_t's d field is accessed in preprocess_record, but bcf_unpack is called in revise_genotypes which happens later. This moves bcf_unpack before bcf1_t's d field is accessed.

xunjieli commented 5 years ago

@mlin I ran into a SEGV when GLnexus is dereferencing uninitialized bcf1_t.d. This is to fix it.

mlin commented 5 years ago

@xunjieli Thanks! Do you have a reliable repro case for the original segfault? I'd prefer to look into the root cause a little more. The thing is, I think the bcf1_t objects manipulated at this point should already have been unpacked here:

https://github.com/dnanexus-rnd/GLnexus/blob/f4ce0ffe2c2744f879f3a4f75a70a1aebd88705a/src/BCFKeyValueData.cc#L502

So I'm worried something more complicated is going wrong even if this patch seems to fix the crash.

xunjieli commented 5 years ago

Thanks @mlin. I am not using BCFKeyValueData. I have an implementation of BCFData that returns a valid bcf1_t, but it wasn't unpacked. I thought about unpacking it myself in the implementation of BCFData, but that's likely to be inefficient since there are other filtering going on, and is probably the best for the genotyper to unpack it before accessing the d field.

mlin commented 5 years ago

@xunjieli Ah, if I may say so, we had put in the BCFData "contract"

https://github.com/dnanexus-rnd/GLnexus/blob/f4ce0ffe2c2744f879f3a4f75a70a1aebd88705a/include/data.h#L108-L111

I believe bcf_unpack is smart enough not to repeat work it's already done, so it shouldn't hurt to do it in prepare_dataset_records again. However, we need to keep the invocation in revise_genotypes too. That one comes right after we make a fresh copy of the record in order to mutate it (also per above contract) so the copy needs to be unpacked.

xunjieli commented 5 years ago

@mlin Thanks for pointing that out. I missed that note in the header file.