Open v4ray opened 1 year ago
This repo is more for reproducibility in academic settings. If you are interested in building a real application maybe you can take a look at:
None of the above implements JOSIE but should be good enough depending on your use case.
On Sat, Apr 8, 2023 at 5:53 PM v4ray @.***> wrote:
Hi, this is a great work! I am trying to experiment with JOSIE to find joinable tables and unsure about the data pipeline. Could you briefly explain how to use this JOSIE codebase to find joinable tables given a query column, if the input data are several raw csv files representing tables?
This code base seems to depend on postgres dump files representing tables. Is it necessary to generate these dump files for the above purpose and if so how to do it?
Thank you!
— Reply to this email directly, view it on GitHub https://github.com/ekzhu/josie/issues/3, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACOGLUVJTDDOVIVMKTMIMLXAIB7NANCNFSM6AAAAAAWXYVUVA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
I recommend starting with MinHashLSH for finding joinbale tables. You first create MinHash for every column. Then you index all the MinHash in an MinHashLSH index. After that you can query the index for columns with high Jaccard similarity.
thanks
Hi, this is a great work! I am trying to experiment with JOSIE to find joinable tables and unsure about the data pipeline. Could you briefly explain how to use this JOSIE codebase to find joinable tables given a query column, if the input data are several raw csv files (another dataset) representing tables?
This code base seems to depend on postgres dump files representing tables. Is it necessary to generate these dump files for the above purpose and if so how to do it?
Thank you!