broadinstitute / gamgee

A C++14 library for NGS data formats
http://broadinstitute.github.io/gamgee/
MIT License
40 stars 13 forks source link

VariantBuilderMultiSampleVector: simple class to make it easier to prepare multi-sample data for VariantBuilder #378

Closed droazen closed 10 years ago

droazen commented 10 years ago

Made the efficient one-dimensional vector option for setting individual fields MUCH easier to use by providing a VariantBuilderMultiSampleVector class that handles the work of setting missing values and padding the vector to the maximum field width.

To use, first determine the number of samples and the maximum number of values per sample for the field, then get a pre-initialized vector from the builder. Eg.,

auto multi_sample_vector = builder.get_integer_multi_sample_vector(num_samples, max_values_per_sample);

This vector will have missing values for all samples, with appropriate padding to the maximum field width.

Then, fill in the values for each non-missing sample by invoking the set_sample_value() and/or set_sample_values() functions on your multi-sample vector (set_sample_value() is more efficient than set_sample_values() since it doesn't require a vector construction/destruction for each call). You don't have to worry about samples with no values, since all samples start out with missing values.

Finally, pass your multi-sample vector to the builder (ideally by move):

builder.set_integer_individual_field(field_index, std::move(multi_sample_vector));

-Works with integer and float individual fields, as well as the GT field (string fields are still set by passing in a one-dimensional vector of strings).

-Removed the ability to pass in a raw one-dimensional vector directly for int/float fields -- you must use a VariantBuilderMultiSampleVector if you want to work with flattened multi-sample data.

-Updated tests as necessary.

Resolves #325

droazen commented 10 years ago

For @jmthibault79 to review please, since this is for him :)

coveralls commented 10 years ago

Coverage Status

Coverage increased (+0.13%) when pulling f742b87ce7aa7a483d5acb405274dc227e34d887 on dr_vb_flattened_vector_helper_functions into 05676cf5909257068ca8cdc5cdb26d61cc60d5b5 on master.

jmthibault79 commented 10 years ago

Review done. Looks great, especially all the explanatory comments

droazen commented 10 years ago

Review comments addressed, tests passed -- merging.

coveralls commented 10 years ago

Coverage Status

Coverage decreased (-0.14%) when pulling 1354fc746bada675d00b5c78b5921c67909567b1 on dr_vb_flattened_vector_helper_functions into 71009ce6ef8c6db9640dc05830b049e88b2d63a1 on master.