Global-Chem / global-chem

A Knowledge Graph of Common Chemical Names to their Molecular Definition
https://globalchemistry.org/
Mozilla Public License 2.0
156 stars 21 forks source link

Upgrade Morgan Fingerprint Generation to Use Modern RDKit API #323

Closed akuroiwa closed 1 month ago

akuroiwa commented 2 months ago

Dear Global-Chem developers,

I'd like to propose an update to the Morgan fingerprint generation method used in the project. This change is necessary to address a deprecation warning and to align with RDKit's modern best practices.

Current Implementation

The current implementation uses the deprecated GetMorganFingerprintAsBitVect function:

fp = AllChem.GetMorganFingerprintAsBitVect(mol, self.morgan_radius, nBits=self.bit_representation)

Proposed Change

I suggest updating the code to use the modern MorganGenerator class:

from rdkit.Chem import rdFingerprintGenerator
morgan_gen = rdFingerprintGenerator.GetMorganGenerator(radius=self.morgan_radius, fpSize=self.bit_representation)
fp = morgan_gen.GetFingerprint(mol)

Rationale

  1. Deprecation Warning: The current method triggers a deprecation warning, indicating that it will be removed in future versions of RDKit.

  2. Performance: The new MorganGenerator class is optimized for better performance, especially when generating multiple fingerprints.

  3. Consistency: This change will make the codebase consistent with modern RDKit practices and easier to maintain in the future.

  4. Flexibility: The new API offers more flexibility and options for fingerprint generation, which could be beneficial for future enhancements.

Scope of Changes

I've identified several instances where this change should be applied throughout the codebase. A comprehensive search and replace operation would be necessary to update all occurrences.

Verification

I've tested this change on a small scale and found it to produce equivalent results to the current implementation. However, I recommend a thorough verification process across all use cases in the Global-Chem project to ensure full compatibility.

References

  1. Modern RDKit Fingerprint Generation
  2. RDKit Blog: Fingerprint Generator Tutorial

I'm happy to assist with implementing and testing these changes if needed. Please let me know if you need any further information or clarification.

Thank you for considering this proposal.

Best regards, Akihiro Kuroiwa

Sulstice commented 2 months ago

Go ahead and make all the changes where you see fit, add it to the PR and I can schedule a new release. I was thinking of decomissioning this part of the package but it seems like you guys are using it.

@ANUGAMAGE stage this for the 2.0 release.

akuroiwa commented 2 months ago

Thank you for your feedback on my pull request. I appreciate your guidance and the opportunity to make necessary adjustments.

Based on my understanding, I believe there are only three areas that require modifications. However, I noticed that a bug label has been attached to the PR. Could you please clarify what specific error outputs or issues have been encountered? This information would be very helpful for me to address any potential problems effectively.

I also wanted to mention that I'm currently developing a project called chem-ant, which imports and utilizes global_chem_extensions. The package has been extremely helpful in my work, and I'm grateful for its functionality. This is one of the reasons I'm keen on contributing to its improvement and maintenance.

Thank you once again for your support and for maintaining such a useful package. I look forward to your response.

Sulstice commented 1 month ago

Alright @akuroiwa that is good to know because I didn't realize it was useful for folk. What I will do is bring back the code again into the repository and continue maintenance in it's development. I was about to actually yank it.

Running our base system check now.

Sulstice commented 1 month ago

@ANUGAMAGE Can you test this branch?