Closed Jarrus00 closed 1 year ago
As a first pass, the below meta-attributes seem the most relevant to implement first. These are aimed at retrieving second- or third-degree subset summaries based on groupings for products, users, and categories.
Additionally, drafts of supporting functions have been included to expand and simplify the querying of value distributions.
Initial meta-attributes:
Meta-attribute | Description | Data Type | Valid Operators |
---|---|---|---|
product_review_votes |
Total number of review votes a product has | integer | <, <=, =, >, >= |
product_avg_review_votes |
Flat mean of the votes each review has received | float | <, <=, =, >, >= |
product_avg_time_between_reviews |
Flat mean of the time between reviews for a product | float | <, <=, =, >, >= |
product_avg_helpful_ratio |
Flat mean of the ratio helpful/votes per review entry |
float | <, <=, =, >, >= |
product_weighted_rating |
Average rating, weighted by avg_helpful_ratio |
float | <, <=, =, >, >= |
product_category_similar |
Count of other product nodes which share a category branch | integer | <, <=, =, >, >= |
user_avg_review |
Flat mean of a user's review rating across products | float | <, <=, =, >, >= |
user_avg_helpful_ratio |
Flat mean of a user's helpful/votes ratio across products |
float | <, <=, =, >, >= |
Supporting functions & operators: | Function | Description | Arguments | Example |
---|---|---|---|---|
MA_THRESHOLD_UPPER |
Queries the distribution of meta-attribute values for those above a specified limit | (float, float) | MA_THRESHOLD_UPPER(0.5, 3.5) [<empty>/OVER/UNDER] would return the set of product nodes for which at least 50% of the submitted reviews have a rating of 3.5, with a rating comparison defined by [<empty>/OVER/UNDER] |
|
MA_THRESHOLD_LOWER |
Queries the distribution of meta-attribute values for those below a specified limit | (float, float) | MA_THRESHOLD_LOWER(0.5, 3.5) would return the set of product nodes for which fewer than 50% of the submitted reviews have a rating of 3.5, with a rating comparison defined by [<empty>/OVER/UNDER] |
Background: The nodes and edges of the co-purchasing dataset contain non-searchable/meta-attributes which present additional value by further describing or summarizing the nodes and edges. An assessment of the dataset for these meta-attributes is needed so that the information may be surfaced to the end user.
Problem: End users will not have direct access to the attributes within the co-purchasing dataset and should not be expected to mine the dataset for non-searchable attributes, so we need to identify these to ensure that they are surfaced correctly in the pseudo-SQL they will be using.
Success Criteria: