DataBrewery / cubes

[NOT MAINTAINED] Light-weight Python OLAP framework for multi-dimensional data analysis
http://cubes.databrewery.org
Other
1.49k stars 314 forks source link

Add semi/non-additive measures/aggregates #265

Open Stiivi opened 9 years ago

Stiivi commented 9 years ago

Add support for semi/non-additive measures/aggregates. See documentation in Measure for more information about the option.

This feature was present in a limited form before the SQL backend refactoring.

See the original code for more information how it was implemented.

Suggestion: support this feature only on SQL dialects which have window functions. Instead of complex subquery, use first_value() or last_value() OVER (PARTITION BY nonadditive_dimension_key ORDER BY optional_configurable_order)

Status: It was temporarily removed during the refactoring. It is useful feature that should be brought back.

Credits: Original feature was suggested and prototyped by @robin900.

Stiivi commented 9 years ago

Update: this issue was present in the 1.0 release using a subquery condition. While it worked, it relied on existence of a special/hidden dimension attribute __key__ that had it's own mapping. After trying to implement it into the 1.1 (sql refactoring), I run into several issues with it:

I would rather not have a feature than to have increased complexity for just partial implementation of the feature.

Conclusion (for now): since this feature was marked as experimental in 1.0, I'm dropping it for 1.1 and will wait for a more robust and understandable implementation.

If anyone has a suggestion how to implement it in a simple way, I'm open.

Stiivi commented 9 years ago

Related cubes-discuss mailing list post with some thoughts.

Stiivi commented 9 years ago

Insightful suggestion by @robin900 on the cubes-discuss mailing list.

Extract:

  1. A model defines a time dimension, call it 'date', with several "cubes standard" dimension levels with standard meanings. That is, 'date.year' is the four-digit common calendar year, 'date.dow' is the day of week expressed as 1..7, 'date.week' is the YYYY-MM-DD format of the day ending that week, etc.
  2. The model defines a second dimension, call it 'date_ptd', which somehow indicates that the first time dimension 'date' is a template for 'date_ptd' dimension. Also, 'date_ptd' indicates that it wishes to be a PTD dimension adding an implicit cut for the underlying template dimension, 'date'.
  3. In order for the PTD implicit cut to work, the underlying 'date' dimension needs to define extra hierarchies that re-use existing levels in new arrangements, or alternately, cubes allows cut specifications that don't specify every level of a hierarchy, but instead can specify a cut starting at a depth lower than the top level.