Closed hdoupe closed 4 years ago
(This PR is not backwards-compatible right now, but it will be before it is merged.)
@hdoupe This looks really good- thanks for your work on ParamTools!
@hdoupe This looks really good- thanks for your work on ParamTools!
Thanks @jdebacker!
With the latest commits:
select.py
is uses new Query API but is backwards compatible (except for niche uses of custom comparison or index-related comparison functions).tree.py
is removed.Additions to the API:
Values
from a list of values:
adjustment = {"myparam": [{"value": 1, "label": "someval"}]
vals = params.sel[adustment["myparam"]] # returns a Values instance
union
and intersection
to combine many results:
params = WeatherParams()
queryresults = []
for label, value in {"temperature": "hot", "precipitation": "little", "wind": "variable"}.items():
queryresults.append(params.sel["weather"][label] == value)
return intersection(queryresults)
- A `QueryResult` is just a view on top of a `Values` object, but you can "persist" the subset of values returned in the query by converting the `QueryResult` into a `Values` object like `queryresult.as_values()`. This makes it possible to modify the underlying data:
```python
new_value = [
{
"temperature": "moderate",
"precipitation": "heavy",
"wind": "strong",
"value": "hurricane",
}
]
queryresults = params.sel["weather"]["precipiatation"] == "heavy"
updated = queryresults.as_values().insert(new_value)
params.adjust({"weather": updated})
Also, I tested the API both for feel and for correctness in Tax-Calculator where tests passed locally both as-is using the backwards compatible select
module on the master
branch and with the new api here: https://github.com/hdoupe/Tax-Calculator/commit/4cae756c15b260c405ab21d90a345ba114e21710.
TODO:
values
module more thoroughly.intersection
like this:
queryset = params.sel["some_param"]
queryset &= intersection(
queryset.eq(strict=False, **{label: value})
for label, value in other_labels.items()
)
Values
-- > Slice
--> QueryResults
intuitive?Values
: params.sel["myparam"]
Slice
: params.sel["my_param"]["some_label"]
QueryResult
: queryresult = params.sel["my_param"]["some_label"] > 1234
Values
: new_values = queryresult.as_values()
The latest commits improve performance to be better than current master
branch. This is done by:
copy.deepcopy
which was responsible for almost 50% of the load time when creating Tax-Calculator's Policy
object.SortedKeyList
and Values
to support inserting new values without having to re-build the underlying data structures.Just dropped the WIP tag on PR #114. I'm planning to merge once I add documentation for the new query features.
Latest commits add:
Indexing for the new Values
, Slice
, and QueryResult
objects:
Adds docs for the new features in this PR and accessing parameter values in general. I'm planning to work back through the docs after this PR to use this example (or similar) for the rest of the documentation.
Fixes some date type related bugs. Now you can use month
for the step
argument on range
validators for Date
:
import paramtools
class Params(paramtools.Parameters): defaults = { "schema": { "labels": { "date": { "type": "date", "validators": { "range": {"min": "2020-01-01", "max": "2021-01-01", "step": {"months": 1}} } } }, }, "a": { "title": "A", "type": "int", "value": [{"date": "2020-01-01", "value": 2}, {"date": "2020-10-01", "value": 8},] }, "b": { "title": "B", "type": "float", "value": [{"date": "2020-01-01", "value": 10.5}] } } params = Params(label_to_extend="date") params.sel["a"]
Latest commits:
Update SortedKeyList
to use sortedcontainers
. Now the low-level bisect_left
and bisect_right
usage is handled by sortedcontainers
. This helps ParamTools focus on the higher level query apis and gives it a fast engine for queries.
Fixes some bugs that were found by testing this version of ParamTools against Tax-Cruncher, Tax-Brain, and Cost-of-Capital-Calculator:
cmp_funcs
method is not defined.exact_match
keyword.SortedKeyListException
if unable to create the sortedcontainersSortedKeyList
object.Minor performance improvements through smarter caching with sel
.
Read-ability improvements in the sort_values
method.
This PR re-writes the ParamTools query API so that it is more flexible and familiar to users in the pydata ecosystem:
It replaces the query backend added in PR #74 bringing with it 4 main advantages:
Much simpler. This implementation is based directly off of the ordered-list example in the Python Documentation. Despite the detailed docstring in the
tree.py
module, I still have trouble tracking down and fixing bugs in the complicated search and update methods.Much more flexible. Users can apply standard comparison and logical operators like
&
,<
, etc. to query and chain together query results.Custom ordering functions. Users can define their own ordering functions if their values are not already orderable. For example, if you define a custom type that is a dictionary, then you will get this error if you try to sort a list of them:
But if you supply a key to sort on then Python can sort your list:
The same idea is used in this PR.
A familiar API. The API is inspired by the Pandas
.loc
function. I considered directly copying theloc
function, but since the behavior is a little different (e.g. no slice or column selection behavior), I usedsel
as an abbreviation for the existingselect_*
based API. My intention is to use the same pattern without confusing users who may think that they are working with a dataframe.Here are 3 examples demonstrating the points above:
sel
attribute to query parameter values: