I refactored aw_freeform_table pretty much from scratch. I say "refactored"
because none of the user interface has changed, but the implementation has been
completely overhauled.
I removed a number of functions and R files that were only used in this one
report function, but I did my best to leave others alone. I did not change much
with any other functions, such as aw_anomaly_report().
I hope you don't mind such a large change. Starting from scratch was far from my first choice, but I couldn't make much progress with the existing function. I hope this will make development much easier and more reliable in the future.
Motivation and context
The original version (currently in master) was too difficult to follow, debug,
and modify. Changing one part of the code had unexpected effects
somewhere else.
But I'm excited about this package, and I want to contribute to its maintenance. I couldn't figure out how the function was generating reports, though, so after trying to refactor a bit at a time, I realized it was easier to start from scratch.
Features
New aw_freeform_table implementation, still in a functional style, but with
better isolation of different features
Bugs can now be traced to a single location, and changes are less likely to
affect other parts of the code
The new implementation also offers room for easily adding new features, or
refining existing features
The key function get_req_data, which is responsible for executing the
queries and putting them together, is now recursive
Base Case The dimension being requested is the last dimension. Unpack
the metric list column and return the data
Recursive Case The dimension being requested is not the last dimension.
Collect the dimension itemIds and for each of them, request the next
dimension by calling get_req_data.
New features
I added the following features, which to the best of my ability will not affect
existing code, but will make the package more reliable and predictable:
API responses with empty rows fields are safely handled by filling the
metrics and remaining dimensions with NA values
Requests now support datetimes (POSIXct and POSIXlt objects) in addition
to dates. This way advanced users can request a timeframe down to the second
Users can opt out of error-checking the metrics and components. Pulling all
dimensions and metrics (and sometimes calculated metrics) is time-consuming,
so users can waive this feature by setting check_components = FALSE. If they
are running many queries, it's more efficient to implement the checks themselves
before running any of them.
Setting prettynames = TRUE overrides this and causes the components to be
checked anyway, since pulling the dimension and metric reference is
necessary to make the pretty names.
Changes
I changed the messaging because it is no longer possible to break up the
queries by dimension.
UPDATE: I added a progress bar. To trigger it, at least 20 queries must be planned. It remains incomplete if not all planned queries are executed. Also, it adds the progress R package as a dependency, which may or may not be desirable. It may affect the minimum version of R.
Structure
There are three parts to the query:
Convert user's inputs into a consistent format
Construct individual requests
Make all requests necessary to build the requested freeform table
The aw_freeform_table function is only responsible for the first part,
preparing user inputs. There is a suite of functions for constructing requests,
which add layers of abstraction to make them simple and predictable. Making the
requests is handled by get_req_data, a function which is called recursively as
needed to build the table.
The goal was to relieve the programmer of unnecessary burdens by restricting
what each level is responsible for. Constructing metric containers is a good
example of this. When making a new request, the programmer only has to call
metric_container() with the proper arguments. The metric_container function
handles the problem of lining up the metric filter IDs and the proper metrics,
but it doesn't have to worry about the structure of either field within the
container. That's the job of metric_elems and metric_filters. And so on.
Querying the Data
The new flow is different from the old one. The old structure was like a
breadth-first search, it gathered all of the dimension values at one level
before it started querying the next level. The new structure is like a
depth-first search, because gathers the data for a combination of dimension
levels completely bbefore moving on to the next one.
For example, if you have dim1, dim2, and dim3, the old version is like this:
# Gather all values of dim1
dim1
|_val1
|_val2
|_val3
# Gather all values of dim2 based on dim1
dim1
|_val1
|_dim2
|_val1
|_val2
|_val3
|_val2
|_dim2
|_val1
|_val2
|_val3
|_val3
|_dim2
|_val1
|_val2
|_val3
# Collect all values of dim2 and get values of dim3 with metrics
dim1
|_val1
|_dim2
|_val1 metric
|_dim3 ------
|_val1 xx,xxx
|_val2 xx,xxx
|_val3 xx,xxx
|_val2
|_dim3
|_val1 xx,xxx
|_val2 xx,xxx
|_val3 xx,xxx
|_val3
|_dim3
|_val1 xx,xxx
|_val2 xx,xxx
|_val3 xx,xxx
etc...
The new version works one dimension level combination at a time:
# Gather all values of dim1
dim1
|_val1
|_val2
|_val3
# Gather all values of dim2 for dim1:val1
dim1
|_val1
|_dim2
|_val1
|_val2
|_val3
|_val2
|_val3
# Gather all values of dim3 for (dim1:val1, dim2:val1)
dim1
|_val1
|_dim2
|_val1
|_dim3 metric
|_val1 xx,xxx
|_val2 xx,xxx
|_val3 xx,xxx
|_val2
|_val3
|_val2
|_val3
Each level also has the responsibility of tacking on the name of the dimension
that it is filtered based on. This is a nice bit of encapsulation that simplifies
post-processing the data.
Related Issue
Issue #100
How Has This Been Tested?
I tested this thoroughly, but not exhaustively (what would it mean to
exhaustively test this anyway?).
I ran a battery of generated queries with every combination of four dimensions,
four metrics, and three segments, which covered:
A variety of numbers of dimensions and metrics (from 1 to 4)
No segment, 1 segment, multiple stacked segments
A mix of regular and calculated metrics
Built-in and custom dimensions
Missing data at a variety of levels (e.g., it handles the case where you get
no rows back but more breakdowns were requested)
Some queries with date dimensions, some without
I also tried queries with multiple types of date ranges, including dates,
numerics, character strings, and POSIXcts with timespans specific to a few
hours.
Throughout the whole testing process I compared the results to Adobe Workspace.
Types of changes
[x] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
Checklist:
[x] My code follows the code style of this project.
[x] My change requires a change to the documentation.
[x] I have updated the documentation accordingly.
[ ] I have added tests to cover my changes.
[x] All new and existing tests passed.
[x] The package passes R CMD check with 0 errors, 0 warnings, and 0 notes
Description
I refactored
aw_freeform_table
pretty much from scratch. I say "refactored" because none of the user interface has changed, but the implementation has been completely overhauled.I removed a number of functions and R files that were only used in this one report function, but I did my best to leave others alone. I did not change much with any other functions, such as
aw_anomaly_report()
.I hope you don't mind such a large change. Starting from scratch was far from my first choice, but I couldn't make much progress with the existing function. I hope this will make development much easier and more reliable in the future.
Motivation and context
Features
aw_freeform_table
implementation, still in a functional style, but with better isolation of different featuresget_req_data
, which is responsible for executing the queries and putting them together, is now recursiveitemIds
and for each of them, request the next dimension by callingget_req_data
.New features
I added the following features, which to the best of my ability will not affect existing code, but will make the package more reliable and predictable:
rows
fields are safely handled by filling the metrics and remaining dimensions with NA valuesPOSIXct
andPOSIXlt
objects) in addition to dates. This way advanced users can request a timeframe down to the secondcheck_components = FALSE
. If they are running many queries, it's more efficient to implement the checks themselves before running any of them.prettynames = TRUE
overrides this and causes the components to be checked anyway, since pulling the dimension and metric reference is necessary to make the pretty names.Changes
I changed the messaging because it is no longer possible to break up the queries by dimension.
UPDATE: I added a progress bar. To trigger it, at least 20 queries must be planned. It remains incomplete if not all planned queries are executed. Also, it adds the
progress
R package as a dependency, which may or may not be desirable. It may affect the minimum version of R.Structure
There are three parts to the query:
The
aw_freeform_table
function is only responsible for the first part, preparing user inputs. There is a suite of functions for constructing requests, which add layers of abstraction to make them simple and predictable. Making the requests is handled byget_req_data
, a function which is called recursively as needed to build the table.The goal was to relieve the programmer of unnecessary burdens by restricting what each level is responsible for. Constructing metric containers is a good example of this. When making a new request, the programmer only has to call
metric_container()
with the proper arguments. Themetric_container
function handles the problem of lining up the metric filter IDs and the proper metrics, but it doesn't have to worry about the structure of either field within the container. That's the job ofmetric_elems
andmetric_filters
. And so on.Querying the Data
The new flow is different from the old one. The old structure was like a breadth-first search, it gathered all of the dimension values at one level before it started querying the next level. The new structure is like a depth-first search, because gathers the data for a combination of dimension levels completely bbefore moving on to the next one.
For example, if you have dim1, dim2, and dim3, the old version is like this:
The new version works one dimension level combination at a time:
Each level also has the responsibility of tacking on the name of the dimension that it is filtered based on. This is a nice bit of encapsulation that simplifies post-processing the data.
Related Issue
Issue #100
How Has This Been Tested?
I tested this thoroughly, but not exhaustively (what would it mean to exhaustively test this anyway?).
I ran a battery of generated queries with every combination of four dimensions, four metrics, and three segments, which covered:
I also tried queries with multiple types of date ranges, including dates, numerics, character strings, and POSIXcts with timespans specific to a few hours.
Throughout the whole testing process I compared the results to Adobe Workspace.
Types of changes
Checklist:
R CMD check
with 0 errors, 0 warnings, and 0 notes