felix-reichel / price-search-engine-seals-analysis

Produces a price search engine firm quality seal changes data set of (potentially) skewed index-spaced data cubes within a big data cube.
0 stars 0 forks source link

Impl dynamic inflow sample data/offer memory db loader #12

Closed felix-reichel closed 1 week ago

felix-reichel commented 2 months ago

New:

Largest week offers data, that is ~ 760mb

(52+26)*760 ~ 60gb <= 128gb (S)

L (512gb),XL (1tb) , 2XL (2tb)

Goal: Batch process 5yrs needs 1yr (52 pre-weeks) overlap then, for 4 iterations covering 20-4 years then. (2023-2007)=16. 4 batches. XL should be feasible for offers and clicks only.

felix-reichel commented 2 months ago

Related: Important current issue:

https://github.com/duckdb/duckdb/issues/14087

https://github.com/duckdb/duckdb/pull/12318#issuecomment-2374523222

https://github.com/duckdb/duckdb/issues/12286#issue-2320913128

felix-reichel commented 1 month ago

New:

Largest week offers data, that is ~ 760mb

(52+26)*760 ~ 60gb <= 128gb (S)

L (512gb),XL (1tb) , 2XL (2tb)

Goal: Batch process 5yrs needs 1yr (52 pre-weeks) overlap then, for 4 iterations covering 20-4 years then. (2023-2007)=16. 4 batches. XL should be feasible for offers and clicks only.

felix-reichel commented 1 month ago

Introduced a tables config and basic table caching using duckdb's PRAGMA in faf9fe46ecfbfbfab5b49579cb8cd44c2129a6b9

felix-reichel commented 1 month ago

relates concerning overall memory handling behaviour/caching to:

https://github.com/felix-reichel/price-search-engine-seals-analysis/issues/27

felix-reichel commented 1 month ago
felix-reichel commented 1 week ago

can be properly covered in https://github.com/felix-reichel/price-search-engine-seals-analysis/issues/28 (Generalized dynamic db loaders 2.0)