a-r-j / graphein

Protein Graph Library
https://graphein.ai/
MIT License
1.03k stars 131 forks source link

Architectural issues and remediation paths #46

Closed ericmjl closed 2 years ago

ericmjl commented 3 years ago

As I go through the codebase to write tests, I'm noticing a few architectural issues to the package becoming a pydata-friendly package. I'm going to document here issues/anti-patterns that I found inside the codebase, and propose solutions.

Issue 1: Implicit dependencies

Found propy, biovec, ipymol, pytorch3d as imports inside functions.

This is a problem because when the functions get called and the dependency isn't explicitly stated as part of environment.yml or requirements.txt, then code will unexpectedly error out for a user that does not have those packages installed.

One possible resolution is to add those packages as explicit dependencies. (Explicit is better than implicit, according to the Zen of Python.)

Issue 2: Highly nested codebase

To access protein amino acid featurization, one has to import graphein.protein.features.nodes.amino_acids. That's potentially quite a lot of boilerplate for the end-user.

The easiest resolution path is to import those submodules, or even explicit functions, into a higher-level namespace. For example, we might want to follow NetworkX's and PyMC3's pattern, which imports all of the important items into the top-level namespace. Or we might want to follow a pattern where things related to protein are imported into the protein namespace.

a-r-j commented 3 years ago

Wow, that was a flurry of commits. Thanks for that! Will be going through today.

Ah point taken wrt to imports. I was hoping to keep them as optional dependencies to avoid too much dependence on external libraries at the expense of best practice.

For the nesting, I think this is also a very good point. This was mostly done to organise some similar sounding functions (e.g. feature functions that can operate on nodes or graphs). But I think clever naming conventions (& possible multipledispatch) can help to bring some clarity if we keep them in a higher-level namespace.

ericmjl commented 3 years ago

Actually, optional imports are not a bad idea too! I have a pattern that is used in pyjanitor: https://github.com/ericmjl/pyjanitor/blob/dev/janitor/chemistry.py. Here, the optional dependencies are kept in a try/except block, and then we raise an informative error message that tells the user how to install those dependencies.