NREL / alfabet

Machine learning predictions of bond dissociation energy
https://ml.nrel.gov/bde/
Other
57 stars 17 forks source link

[Draft] Refactor the library - Merge same as PR #10 -> Extensibility + Customability #9

Closed IchiruTake closed 2 years ago

IchiruTake commented 2 years ago

@pstjohn Please review for me when I am processed and afterward. Do you mind if the function convert to the class object for better the state of the molecule is maintained.

Related: #2

IchiruTake commented 2 years ago

@pstjohn I have completed the first step preparing module utils.py as a replacement for fragment.py. Can you take a look at it ? If everything going well, should I push all calling methods from different files such as prediction.py and model.py using the module utils.py ?

IchiruTake commented 2 years ago

@pstjohn Can you review the code for me? What I did is to:

IchiruTake commented 2 years ago

@pstjohn I have corrected the failed test. Can you review it again ?

IchiruTake commented 2 years ago

Just so I'm following, is the computational speed improvement here mainly de-duplicating inside the fragment code, rather than after all the fragments have been generated? That might be more simply implemented as a line or two here: https://github.com/NREL/alfabet/blob/master/alfabet/fragment.py#L11-L12

A couple style notes: It's standard to use snake_case for variable and function names, and TitleCase for class definitions. Internal class functions usually just start with a preceding underscore (and don't have a trailing one) i.e., _my_internal_function.

I will correct the function as your style. Yes you are right, but it does not stop there, the optimization of de-duplicating is important but what I want is to centralize all molecules processing with a centralized class to avoid any memory leak, possible parallelization and reduce duplicated operation. For example, the conversion of molecules from SMILES and AddHs function which return new molecule. Even if they are written in C++, the time and temporary memory is non-trivial with less yime fluctuation. The web-based is the best example to get this advantage, deduplicate fragments benefits large dataset processing

Moreover, I tried to reduce the namespace of ALFABET by taking only necessary functions which is definitely faster with narrower import. I would fix it in within this week as in here, it is the midnight.

Furthermore, we want to gain more control on the molecule processing, which you can see in the method MolProcessor::GetReaction() and reduce future refactoring.

All in all, thank you for your review.

pstjohn commented 2 years ago

Are you able to run the test suite locally? Looks like some of those tests are still failing

IchiruTake commented 2 years ago

Are you able to run the test suite locally? Looks like some of those tests are still failing Let me check that if the installation behaves well

IchiruTake commented 2 years ago

Are you able to run the test suite locally? Looks like some of those tests are still failing

The installation worked fine and call test_predict() does not raise assertion. I think it would be better to cast the current state of utils.py to fragment.py for better maintenance. I will try to make code more consistently to your coding style. If there are anything else, you can response to me immediately through Linkedin for faster response.

Speed the model start-up by setting arg::compile=False on empty configuration model.

Linkedin: https://www.linkedin.com/in/minh-pham-hoang-0b0626172/

image

IchiruTake commented 2 years ago

This PR attempted to refactor the source code by:

IchiruTake commented 2 years ago

The tests worked well. Do you have any comments? @pstjohn