COOL-cohort / COOL

the source code of the COOL system
https://www.comp.nus.edu.sg/~dbsystem/cool/
Apache License 2.0
44 stars 15 forks source link

Discussion about using current example dataset to generate cohort query #84

Open Zrealshadow opened 1 year ago

Zrealshadow commented 1 year ago

We want to generate cohort query from sogamo dataset for cohortQueryProcessing unittest. Through some simple data analysis, there some problems. we found that:

In sogamo dataset, there are only 4 players in the entire dataset which contains 10k items. Thus the cohort query in old-version code is not representative. It can not work well as a unittest. According to the CoHANA paper, the raw data is larger than the sample data current we have. I recommend use raw data to generate test cohort query.

In tpch dataset, there is a same problem. There is only 1 user in the entire dataset. Total order in this datasets is about the same user.