DataBrewery / cubes

[NOT MAINTAINED] Light-weight Python OLAP framework for multi-dimensional data analysis
http://cubes.databrewery.org
Other
1.49k stars 314 forks source link

How to use 'group by and having clause' like sql? #333

Open namndev opened 8 years ago

namndev commented 8 years ago

Some times, To Query faster I have use 'group by and having clause in sql'. But in Cubes, I don't known to this work.

statement = sql.expression.select(selection,
                                          from_obj=context.star,
                                          use_labels=True,
                                          whereclause=condition,
                                          group_by=group_by)

That: Do not using HAVING.

namndev commented 8 years ago

I try edit this libs bellow:

1. In cubes/cell.py

    def __init__(self, cube=None, cuts=None,  having_clauses=None):
        if not isinstance(cube, Cube):
            raise ArgumentError("Cell cube should be sublcass of Cube, "
                                "provided: %s" % type(cube).__name__)
        self.cube = cube
        self.cuts = cuts if cuts is not None else []
        self.having_clauses = having_clauses if having_clauses is not None else []

    def __and__(self, other):
        """Returns a new cell that is a conjunction of the two provided
        cells. The cube has to match."""
        if self.cube != other.cube:
            raise ArgumentError("Can not combine two cells from different "
                                "cubes '%s' and '%s'."
                                % (self.name, other.name))
        cuts = self.cuts + other.cuts
        having_clauses = self.having_clauses + other.having_clauses
        return Cell(self.cube, cuts=cuts, having_clause=having_clauses)

    def to_dict(self):
        """Returns a dictionary representation of the cell"""
        result = {
            "cube": str(self.cube.name),
            "cuts": [cut.to_dict() for cut in self.cuts]
            "having_clauses": [clause.to_dict() for clause in self.having_clauses]
        }
        return result

2. In cubes/sql/query.py add 3 methods to end line:

    def clause_for_having(self, cell):
        """Returns a clause for having clause and attr for group. If cell is empty, not contain having or cell is
        `None` then returns `None`."""

        if not cell:
            return None

        clauses = self.clauses_for_having(cell.having_clauses)
        condition = and_(*clauses["condition"])
        clauses["condition"] = condition
        return clauses

    def clauses_for_having(self, having_clauses):
        clauses = []
        groups = []
        for cut in having_clauses:
            hierarchy = str(cut.hierarchy) if cut.hierarchy else None
            if isinstance(cut, PointCut):
                path = cut.path
                hav_conds = self.having_condition(str(cut.dimension),
                                                     path,
                                                     hierarchy, cut.invert)
            clauses.append(hav_conds["condition"])
            groups += hav_conds["group"]
        # return one dict
        dict_clause = {"groups": groups, "condition": clauses}
        return dict_clause

    def having_condition(self, dim, path, hierarchy=None, invert=False):
        """Returns a dict of `Condition` tuple (`attributes`, `conditions`,
        `group_by`) dimension `dim` point at `path` and list group attrs use having. It is a compound
        condition - one equality condition for each path element in form:
        ``level[i].key = path[i]``"""
        conditions = []
        groups = []
        levels = self.level_keys(dim, hierarchy, path)
        for level_key, value in zip(levels, path):
            # Prepare condition: dimension.level_key = path_value
            column = self.column(level_key)
            conditions.append(column == value)
            groups.append(column)

        condition = sql.expression.and_(*conditions)

        if invert:
            condition = sql.expression.not_(condition)

        dict_condition = {"group": groups, "condition": condition}
        return dict_condition

3. In cubes/sql/browser.py At method aggregation_statement edit line:

statement = sql.expression.select(selection,
                                          from_obj=context.star,
                                          use_labels=True,
                                          whereclause=condition,
                                          group_by=group_by)

to

        # HAVING
        # ------
        having_clauses = context.clause_for_having(cell)
        havings = having_clauses["condition"]
        group_clauses = having_clauses["groups"]
        if group_by is None:
            group_by = []
        for group in group_clauses:
            if group not in group_by:
                group_by.append(group)
        statement = sql.expression.select(selection,
                                          from_obj=context.star,
                                          use_labels=True,
                                          whereclause=condition,
                                          group_by=group_by,
                                          having=havings)

I only using to PointCut in Having Clause because my needs just had so

jjmontesl commented 8 years ago

Thanks for this. This feature shall include unit tests. Could you please provide some?

jjmontesl commented 8 years ago

Also, a further explanation of the use case would be welcome to better understand this patch.

Stiivi commented 8 years ago

As @jjmontesl suggested, can you please provide plain SQL use-case examples which demonstrate the difference with and without the proposed feature?