Open ondraz opened 4 months ago
@ondraz Hi Ondro, I think doing this might be tricky because the dataframe would then need to contain all possible combinations of element
and product
dimension values for the view
goal (for example rows for button-2
and p-2
. Otherwise, the group by would produce aggregations with missing data.
If we just aggregate (sum) these 4 lines of agg. goal data,
test-multi-dimension a test_unit_type unit view button-1 p-1 200 200 200 200 200
test-multi-dimension b test_unit_type unit view button-1 p-1 220 220 220 220 220
test-multi-dimension a test_unit_type unit view button-1 100 100 100 100 100
test-multi-dimension b test_unit_type unit view button-1 180 180 180 180 180
we get exactly what we already have in the extra two lines with no dim values:
test-multi-dimension a test_unit_type unit view 300 300 300 300 300
test-multi-dimension b test_unit_type unit view 400 400 400 400 400
so:
count(test_unit_type.unit.view)
- we can use four lines above and just sum themcount(test_unit_type.unit.view(element=button-1)
- we filter four lines above by dim value and sum valuesThere's probably some argument we did it this way where we require those extra two lines with empty dim data but I don't recall it.
You're right that it works in this case but I don't think it would work in general.
element = button-2
in the data that the DAO is selecting from.count(test_unit_type.unit.view(element=button-2))
in the experiment metrics, the button-2
views will not show up in the data frame.element = button-1
to produce count(test_unit_type.unit.view)
will get us incorrect results because it will be missing the 200 button-2
views.Also looking at the test-multi-dimension
data, they kind of don't make sense. 😄
test-multi-dimension a test_unit_type unit view button-1 p-1 200 200 200 200 200
test-multi-dimension b test_unit_type unit view button-1 p-1 220 220 220 220 220
test-multi-dimension a test_unit_type unit view button-1 100 100 100 100 100
test-multi-dimension b test_unit_type unit view button-1 180 180
For example, the third row contains all button-1
views from all products so its total count shouldn't be lower than the total count from the first row which represent button-1
views from p-1
product only. The first row views should be subset of the third row views.
I was just testing that the goal selection works correctly and didn't think about the specific values.
E.g. when evaluating aggregated data of
test-mutli-dimension
experiment intest_multi_dimension
, we require to have another copy of aggregated data without dimensional columns see here.It would be nice just to "group by" dimensional data without the need to have extra aggregated data without dimensions in agg goals dataframe.
current data:
data format requested in this issue: