WSU-CPTS415-ParquetParkour / Amazon-CoPurchasing

GNU General Public License v3.0
0 stars 0 forks source link

[TASK] Define meta-attributes for end-user queries #10

Closed Jarrus00 closed 1 year ago

Jarrus00 commented 2 years ago

Background: The nodes and edges of the co-purchasing dataset contain non-searchable/meta-attributes which present additional value by further describing or summarizing the nodes and edges. An assessment of the dataset for these meta-attributes is needed so that the information may be surfaced to the end user.

Problem: End users will not have direct access to the attributes within the co-purchasing dataset and should not be expected to mine the dataset for non-searchable attributes, so we need to identify these to ensure that they are surfaced correctly in the pseudo-SQL they will be using.

Success Criteria:

  1. The non-searchable attributes and their data types (string, integer, float, etc.) are known.
  2. The types of valid comparison operators are known for each non-searchable attribute.
  3. The above information is documented in the wiki for reference.
Jarrus00 commented 2 years ago

As a first pass, the below meta-attributes seem the most relevant to implement first. These are aimed at retrieving second- or third-degree subset summaries based on groupings for products, users, and categories.

Additionally, drafts of supporting functions have been included to expand and simplify the querying of value distributions.

Initial meta-attributes:

Meta-attribute Description Data Type Valid Operators
product_review_votes Total number of review votes a product has integer <, <=, =, >, >=
product_avg_review_votes Flat mean of the votes each review has received float <, <=, =, >, >=
product_avg_time_between_reviews Flat mean of the time between reviews for a product float <, <=, =, >, >=
product_avg_helpful_ratio Flat mean of the ratio helpful/votes per review entry float <, <=, =, >, >=
product_weighted_rating Average rating, weighted by avg_helpful_ratio float <, <=, =, >, >=
product_category_similar Count of other product nodes which share a category branch integer <, <=, =, >, >=
user_avg_review Flat mean of a user's review rating across products float <, <=, =, >, >=
user_avg_helpful_ratio Flat mean of a user's helpful/votes ratio across products float <, <=, =, >, >=
Supporting functions & operators: Function Description Arguments Example
MA_THRESHOLD_UPPER Queries the distribution of meta-attribute values for those above a specified limit (float, float) MA_THRESHOLD_UPPER(0.5, 3.5) [<empty>/OVER/UNDER] would return the set of product nodes for which at least 50% of the submitted reviews have a rating of 3.5, with a rating comparison defined by [<empty>/OVER/UNDER]
MA_THRESHOLD_LOWER Queries the distribution of meta-attribute values for those below a specified limit (float, float) MA_THRESHOLD_LOWER(0.5, 3.5) would return the set of product nodes for which fewer than 50% of the submitted reviews have a rating of 3.5, with a rating comparison defined by [<empty>/OVER/UNDER]