Nathaniel-Han / End-to-End-CardEst-Benchmark

A new CardEst Benchmark to Bridge AI and DBMS
105 stars 30 forks source link

A few StatsCEB subqueries have incorrect labels #7

Closed Btsan closed 1 year ago

Btsan commented 2 years ago

I was checking the subqueries provided in stats_CEB_sub_queries.sql and noticed that a few of the subqueries are labeled with the wrong parent queries.

For example, the subquery #1551 (0-indexed) is labeled as belonging to the query #102 (also 0-indexed) of stats_CEB.sql.

query #102 (line 103):

3778084||SELECT COUNT(*) FROM votes as v, posts as p, badges as b, users as u WHERE u.Id = b.UserId AND u.Id = p.OwnerUserId AND u.Id = v.UserId AND p.PostTypeId=1 AND p.CommentCount>=0 AND p.CommentCount<=15 AND u.Reputation>=1 AND u.DownVotes>=0 AND u.DownVotes<=1;

subquery #1551 (line 1552):

SELECT COUNT(*) FROM postHistory as ph, comments as c, users as u WHERE ph.UserId = u.Id AND c.UserId = u.Id AND ph.CreationDate>='2010-07-28 09:11:34'::timestamp AND ph.CreationDate<='2014-09-06 06:51:53'::timestamp;||102||260664622

From the predicates, you can see that this subquery doesn't match the query. Instead, this subquery seems to belong to the next query #103. I'm not sure where in your subquery generation script this error occurs, or maybe the issue is in the join_est_record_stats_CEB_process.txt file used for subquery generation.

I've found the same mislabeling on a couple other subqueries:

Fortunately, I don't think these errors would've impacted the results in your paper. Still, I hope you can figure out why this happened and correct the subqueries in your uploaded files.

wuziniu commented 1 year ago

@Btsan thank you very much for pointing this out. You are correct that there is indeed come sub-plan query mislabeled and we are currently trying to figure out the correct labels. Worth noticing that, this label is for reference only. When injecting the estimated cardinality into Postgres, postgres will not use this label so this does not affect the results on the paper as you pointed out.

wuziniu commented 1 year ago

@Btsan Problem resolved. The bug is with the gen_sub_queries_sql_STATS.py. Now I have pushed the new sub-plan-queries.