Closed Y-Wakuta closed 4 years ago
The cost of SUM() or COUNT() would accordingly increase with # of records because this requires a full scan. And the cost of GROUP BY would be consist of the cost of ORDER BY and record scan. But NoSE expects sort to be executed in user's application while ORDER BY and GROUP BY are supported by CQL. I think we need to change the method to expect ORDER BY and GROUP BY executed in Cassandra.
I think we will be able to deal with nested queries by separating them into two distinct queries. But we need to link those queries in query plans.
These queries are from AuctionMark . Although there also are updates, the most important part is query sets.
* SELECT gag_name + ” ” + gav_name FROM GLOBAL_ATTRIBUTE_GROUP, GLOBAL_ATTRIBUTE_VALUE WHERE gav_id = gav_ids[i] AND gav_gag_id = gag_ids[i] AND gav_gag_id = gag_id;
* SELECT ib_bid, ib_max_bid FROM ITEM_BID WHERE ib_id = <ib_id> AND ib_i_id = <i_id> AND ib_u_id = <u_id>;
* SELECT ib_buyer_id FROM ITEM_BID WHERE ib_id = <imb_ib_id> AND ib_i_id = <imb_ib_i_id> AND ib_u_id = <imb_ib_u_id>;
* SELECT ib_buyer_id FROM ITEM_BID WHERE ib_id = <maxBidId> AND ib_i_id = <itemId> AND ib_u_id = <userId>;
* SELECT imb_ib_id, imb_ib_i_id, imb_ib_u_id FROM ITEM_MAX_BID WHERE imb_i_id = <i_id> AND imb_u_id = <u_id>;
* SELECT imb_ib_id FROM ITEM_MAX_BID WHERE imb_i_id = <itemId> AND imb_u_id = <userId>;
* SELECT MAX(ip_id) FROM ITEM_PURCHASE WHERE ip_ib_id = <ib_id> AND ip_ib_i_id = <i_id> AND ip_ib_u_id = <u_id>
* SELECT MAX(if_id) FROM ITEM_FEEDBACK WHERE if_i_id = <i_id> AND if_u_id = <seller_id>
* SELECT i_id, i_u_id, i_initial_price, i_current_price FROM ITEM WHERE i_id = AND i_u_id = AND i_status = 0;
* SELECT i_id, i_u_id, i_status FROM ITEM WHERE i_start_date BETWEEN AND AND i_status = 0 LIMIT 100;
* SELECT i_id, i_u_id, i_name, i_current_price, i_end_date, i_status FROM ITEM WHERE i_u_id = <u_id> ORDER BY i_end_date ASC LIMIT 20;
* SELECT i_id, i_u_id, i_name, i_current_price, i_end_date, i_status FROM USER_ITEM, ITEM WHERE ui_u_id = <u_id> AND ui_i_id = i_id AND ui_i_u_id = i_u_id ORDER BY i_end_date ASC LIMIT 10;
* SELECT if_rating, if_comment, if_date, i_id, i_u_id, i_name, i_end_date, i_status, u_id, u_rating, u_sattr0, u_sattr1 FROM ITEM_FEEDBACK, ITEM, USER WHERE if_buyer_id = <u_id> AND if_i_id = i_id AND if_u_id = i_u_id AND if_u_id = u_id ” ORDER BY if_date DESC LIMIT 10;
* SELECT uw_u_id, i_id, i_u_id, i_name, i_current_price, i_end_date, i_status, uw_created FROM USER_WATCH, ITEM WHERE uw_u_id = <u_id> AND uw_i_id = i_id AND uw_i_u_id = i_u_id ORDER BY i_end_date ASC LIMIT 25
* SELECT * FROM ITEM_COMMENT WHERE ic_u_id = <seller_id> AND ic_response IS NULL;
* SELECT u_id, u_rating, u_balance, u_created, u_sattr0, u_sattr1, u_sattr2, u_sattr3, u_sattr4, r_name FROM USER, REGION WHERE u_id = <u_id> AND u_r_id = r_id;
I can find some variations in the WHERE clause in the queries for ITEM. Many updates are also for ITEM. However, I could not find variations in the queries for other entities. For now, AuctionMark is not the best choice I think.
For now, I will execute evaluation for TPC-H and RUBIS.
We need time-depend workload for evaluation. We possibly can use HTAP workload by changing the balance between OLTP and OLAP.
Candidates: