felix-reichel / price-search-engine-seals-analysis

Produces a price search engine firm quality seal changes data set of (potentially) skewed index-spaced data cubes within a big data cube.
0 stars 0 forks source link

Original Specification s.t. open discussion; Obs_id / Identifier is ID(i,j,t,S) not (i,j,t), (i,j,t)-Duplicates are imho possible. #30

Closed felix-reichel closed 1 month ago

felix-reichel commented 1 month ago

Suppose:

Proposition 1:

Two firms experience a seal change within the same year or within a shared time-window within a 52-week window.

According to the current selection criteria, we sample (i, j, t) as follows:

  1. (I): Select TOP_N=200 products based on clicks, filtered by the condition that firm j offers them.
  2. (II): Select 10 counterfactual firms j through a deterministic random sampling process.

It is possible for two firms to overlap in (I), meaning they may share the same counterfactual firm j from (II). Under Proposition 1, the (i, j, t)-tuple may no longer be unique.

Thus, a unique row is only identified by (i, j, t, S), where S serves as an additional identifier (denoting a Seal Provider World).

Appendix/Remark: Is not problem for estimation of various models with the data-set, but DUPLICATES of (i,j,t)-Rows are not something that should be dropped.