equinor / webviz-ert

ERT webviz plugins
GNU General Public License v3.0
12 stars 24 forks source link

Investing pyarrow and if that can help us with better performance #199

Closed oysteoh closed 2 years ago

oysteoh commented 2 years ago

webviz has already ran into issue with memory problem and performance problems on the client. They have "solved" it by using pyarrow along with interpolation-algorithms server side to do it more efficient.

We have to do something as well to minimize the data sent towards the client - and we should look into an other technology hosting the api - maybe pyarrow?

TerryHannant commented 2 years ago

Isn't pyarrow already used as part of parquet support in the ert-storage api?

frode-aarstad commented 2 years ago

Apache Arrow with its python package pyarrow is a language independent columnar memory format for fast data access. So using an arrow in the frontend to access the data does not really solve any performance issues related to transmission of data.

Apache Parquet is a storage format built to support efficient compression and can be used together with Arrow for efficient transmission of data.

When we are building support for larger-than-memory datasets it might be beneficial looking more into these two technologies. If they can be used in the backend combining data from different data sources (Arrow) and streaming data to the front end (Parquet)