Rewrite the BFSStrategy using Pandas read_sql_query

The HeteroDataBuilder currently does the following:

loads each table (in full) using pd.read_sql
computes the edge_index for each relation using Pandas on top of the loaded tables

This is really fast. Since Pandas also supports pd.read_sql_query for any SQL query built using SQLAlchemy, I propose to rewrite BFSStrategy using Pandas as well. I expect that the benefits may be speed (hopefully), cleaner code, and results that will be more consistent with HeteroDataBuilder (as the new type converters use Pandas anyway as well - also for speed reasons).

I think the new BFSStrategy could work as follows:

load the target table (or a batch from the target table) using a single call to pd.read_sql
then load the joins like that as well within the BFS
then the edge_index computation can be done at the end similarly as I do it (hopefully)

Then at the end we should probably merge HeteroDataBuilder with Dataset and somehow find a nice way to have it as two different strategies for the dataset ("full strategy" vs "bfs strategy").

LukasZahradnik / deep-db-learning

Rewrite the BFSStrategy using Pandas read_sql_query #25