TileDB-Inc / TileDB-VCF

Efficient variant-call data storage and retrieval library using the TileDB storage library.
https://tiledb-inc.github.io/TileDB-VCF/
MIT License
83 stars 13 forks source link

Reduce memory usage and improve perf when reading VCF headers #631

Closed gspowley closed 7 months ago

gspowley commented 7 months ago

Read VCF headers with the ManagedQuery API which:

  1. Reduces memory usage by reserving buffer space using std::vector<T>::reserve() instead of allocating and populating user buffers. The ManagedQuery uses the actual amount of memory required, while the previous implementation used the entire estimated buffer size, which was pessimistic.
  2. Improves performance by avoiding populating the estimated buffer size with empty data.

This PR also resolves an edge case where reading the first sample would fail if the first sample was previously deleted.

shortcut-integration[bot] commented 7 months ago

This pull request has been linked to Shortcut Story #38153: VCF memory usage increase when reading non-materialized attributes.