Look for Time-Depend "Workload"

Y-Wakuta commented 4 years ago

We need time-depend workload for evaluation. We possibly can use HTAP workload by changing the balance between OLTP and OLAP.

Candidates:

Y-Wakuta commented 4 years ago

The cost of SUM() or COUNT() would accordingly increase with # of records because this requires a full scan. And the cost of GROUP BY would be consist of the cost of ORDER BY and record scan. But NoSE expects sort to be executed in user's application while ORDER BY and GROUP BY are supported by CQL. I think we need to change the method to expect ORDER BY and GROUP BY executed in Cassandra.

I think we will be able to deal with nested queries by separating them into two distinct queries. But we need to link those queries in query plans.

Y-Wakuta commented 4 years ago

Other possible workload ideas

use workload of Observatory. This includes so many range conditions and few equal conditions. Therefore, convert some range conditions to equal conditions. Of course, we need to mention about this modification in the paper but possibly not how many range conditions were modified.
use workload about time-series sensor data. As a real-world workload, sensor data are inserted by sensors. As an artificial workload, users execute queries on them.

Y-Wakuta commented 4 years ago

These queries are from AuctionMark . Although there also are updates, the most important part is query sets.

* SELECT gag_name + ” ” + gav_name FROM GLOBAL_ATTRIBUTE_GROUP, GLOBAL_ATTRIBUTE_VALUE WHERE gav_id = gav_ids[i] AND gav_gag_id = gag_ids[i] AND gav_gag_id = gag_id;

* SELECT ib_bid, ib_max_bid FROM ITEM_BID WHERE ib_id = <ib_id>     AND ib_i_id = <i_id>        AND ib_u_id = <u_id>;
* SELECT ib_buyer_id        FROM ITEM_BID WHERE ib_id = <imb_ib_id> AND ib_i_id = <imb_ib_i_id> AND ib_u_id = <imb_ib_u_id>;
* SELECT ib_buyer_id        FROM ITEM_BID WHERE ib_id = <maxBidId>  AND ib_i_id = <itemId>      AND ib_u_id = <userId>;

* SELECT imb_ib_id, imb_ib_i_id, imb_ib_u_id FROM ITEM_MAX_BID WHERE imb_i_id = <i_id>   AND imb_u_id = <u_id>;
* SELECT imb_ib_id                           FROM ITEM_MAX_BID WHERE imb_i_id = <itemId> AND imb_u_id = <userId>;

* SELECT MAX(ip_id) FROM ITEM_PURCHASE WHERE ip_ib_id = <ib_id> AND ip_ib_i_id = <i_id> AND ip_ib_u_id = <u_id>

* SELECT MAX(if_id) FROM ITEM_FEEDBACK WHERE if_i_id = <i_id> AND if_u_id = <seller_id>

* SELECT i_id, i_u_id, i_initial_price, i_current_price                                                                 FROM ITEM WHERE i_id = AND i_u_id = AND i_status = 0;
* SELECT i_id, i_u_id, i_status                                                                                         FROM ITEM WHERE i_start_date BETWEEN AND AND i_status = 0 LIMIT 100;
* SELECT i_id, i_u_id, i_name, i_current_price, i_end_date, i_status                                                    FROM ITEM WHERE i_u_id = <u_id> ORDER BY i_end_date ASC LIMIT 20;
* SELECT i_id, i_u_id, i_name, i_current_price, i_end_date, i_status                                                    FROM USER_ITEM, ITEM WHERE ui_u_id = <u_id> AND ui_i_id = i_id AND ui_i_u_id = i_u_id ORDER BY i_end_date ASC LIMIT 10;
* SELECT if_rating, if_comment, if_date, i_id, i_u_id, i_name, i_end_date, i_status, u_id, u_rating, u_sattr0, u_sattr1 FROM ITEM_FEEDBACK, ITEM, USER WHERE if_buyer_id = <u_id> AND if_i_id = i_id AND if_u_id = i_u_id AND if_u_id = u_id ” ORDER BY if_date DESC LIMIT 10;
* SELECT uw_u_id, i_id, i_u_id, i_name, i_current_price, i_end_date, i_status, uw_created                               FROM USER_WATCH, ITEM WHERE uw_u_id = <u_id> AND uw_i_id = i_id AND uw_i_u_id = i_u_id ORDER BY i_end_date ASC LIMIT 25

* SELECT * FROM ITEM_COMMENT WHERE ic_u_id = <seller_id> AND ic_response IS NULL;

* SELECT u_id, u_rating, u_balance, u_created, u_sattr0, u_sattr1, u_sattr2, u_sattr3, u_sattr4, r_name FROM USER, REGION WHERE u_id = <u_id> AND u_r_id = r_id;

I can find some variations in the WHERE clause in the queries for ITEM. Many updates are also for ITEM. However, I could not find variations in the queries for other entities. For now, AuctionMark is not the best choice I think.

Y-Wakuta commented 4 years ago

For now, I will execute evaluation for TPC-H and RUBIS.

Y-Wakuta / nosql_time-series_schema_designer

Look for Time-Depend "Workload" #1

Other possible workload ideas