felix-reichel / price-search-engine-seals-analysis

Produces a price search engine firm quality seal changes data set of (potentially) skewed index-spaced data cubes within a big data cube.
0 stars 0 forks source link

Spec Business Layer (Data Set, Variable, Criterion, ExcludeScrapperIpsCriterion, VariableRenderStrategy, etc.) #24

Closed felix-reichel closed 1 week ago

felix-reichel commented 1 month ago

Desired behaviour (by example):

data = pl.DataFrame()     # The target data set as Pl.DataFrame; contains full or partial index space.

db_source = ":memory:"    # The source data as duckdb source ref. or a DuckDBDataSource instance f.e.

dataset = DataSet(data=data, db_source=db_source)    # target and source for the data set

# determines where clauses which will be produced and then finally used for db.loaders and in the specific renderer classes
# space selections has to validated and renderstrategies for the varible each axis has to be determined.
space_selector = SpaceSelector(
     haendler_bez="F1", produkt_id=["P1","P2","P3"], week_running_var=[42,43,44,45], data=data)

# create a variable
clicks_variable = Variable(
  name="clicks_ijt",          # var label
  description="Total clicks (sum) on product i offered by firm j occured in week t.",        # var label desc. 
  criterions=[ExcludeScrapperIpsCriterion()],                            # criterion that excludes scrapper IPs, intercepts target column query # filter or other criterions, must fulfill a specific order ... to be a qualified target query interceptor... where to determine(?)
  imputation_strategy=ImputationStrategy.NONE,                    # imputes target column query
  render_strategy=VariableRenderStrategy.INNER_SPACE      # Sets specific render strategy for the variable
)

# create another variable
lct_variable = Variable(
  name="lct_ijt", 
  description="Total Last click-through (sum) for product i by firm j in week t.",
  render_strategy=VariableRenderStrategy.OUTER_SPACE       # Set render strategy 
)

batch_renderer = BatchVariableRenderer(dataset)

# starts a new renderer.
batch_renderer.batch_render(
  variables=[clicks_variable, lct_variable], 
  space_selector=space_selector) 

# based on variables and space_selector -> validation -> some db.loaders have to be called from the renderer classes.

Requirements: