broadinstitute / gamgee

A C++14 library for NGS data formats
http://broadinstitute.github.io/gamgee/
MIT License
40 stars 13 forks source link

Consider adding boundary checks to VariantBuilderMultiSampleVector #390

Closed droazen closed 9 years ago

droazen commented 9 years ago

The existing VariantBuilderMultiSampleVector implementation does not do any boundary checking, since it assumes that if you are using that class to set individual fields you want maximum efficiency with minimum overhead.

However, given that client code will be working directly with instances of this class all the time, and given the horrors of undefined behavior, it's possible that this decision was unwise, and that we should have boundary checks here. Since VariantBuilder is in charge of creating instances, it could propagate its validation settings to the VariantBuilderMultiSampleVectors, and boundary checks could be activated only for vectors obtained from a builder that had validation turned on.

MauricioCarneiro commented 9 years ago

Isn't checking for the flag going to be just as expensive as checking for the boundaries?

Mauricio Carneiro, Ph.D. http://www.broadinstitute.org/~carneiro/

droazen commented 9 years ago

No, because there are multiple boundary conditions to check (sample index and value index)

droazen commented 9 years ago

However, you bring up a good point that it might be better to just unconditionally check boundaries here instead of respecting VariantBuilder validation settings and adding an extra check per call.

MauricioCarneiro commented 9 years ago

I'd rather pay for 2 checks that will protect me than for 1 that will do nothing 99% of the time and then do 2 more.

droazen commented 9 years ago

The problem is that these boundary checks would be both per-sample and per-value-within-each-sample, so a lot of extra overhead