LukasZahradnik / deep-db-learning

A modular message-passing scheme reflecting the relational model for end-to-end deep learning from databases
https://lukaszahradnik.github.io/deep-db-learning/
6 stars 2 forks source link

Rewrite the BFSStrategy using Pandas read_sql_query #25

Open neumannjan opened 1 year ago

neumannjan commented 1 year ago

The HeteroDataBuilder currently does the following:

This is really fast. Since Pandas also supports pd.read_sql_query for any SQL query built using SQLAlchemy, I propose to rewrite BFSStrategy using Pandas as well. I expect that the benefits may be speed (hopefully), cleaner code, and results that will be more consistent with HeteroDataBuilder (as the new type converters use Pandas anyway as well - also for speed reasons).

I think the new BFSStrategy could work as follows:

Then at the end we should probably merge HeteroDataBuilder with Dataset and somehow find a nice way to have it as two different strategies for the dataset ("full strategy" vs "bfs strategy").