Implented in C++ for featurization and preprocessing optimizations, along with a few other optimizations, significantly reducing memory usage, disk usage, and processing time for large datasets.
Changelogs
Move all the molecular featurization (atoms, bonds, positional encodings, etc.) to C++
Enable dataloading directly from Smiles during runtime
Improve memory + speed of dataloading by >10X
Authors: Most changes from @ndickson-nvidia , with some minor adjustment from @DomInvivo
discussion related to that PR
That PR will allow Graphium to perform much much faster, and unlock a new usage of positional encodings since they won't be a bottleneck anymore. Smiles -> pyg graph + pos encodings will now be done directly during dataloading.
Implented in C++ for featurization and preprocessing optimizations, along with a few other optimizations, significantly reducing memory usage, disk usage, and processing time for large datasets.
Changelogs
Authors: Most changes from @ndickson-nvidia , with some minor adjustment from @DomInvivo
discussion related to that PR
That PR will allow Graphium to perform much much faster, and unlock a new usage of positional encodings since they won't be a bottleneck anymore.
Smiles -> pyg graph + pos encodings
will now be done directly during dataloading.