PatWalters / rapids_cheminformatics

Some demos using Nvidia RAPIDS for Cheminformatics
MIT License
13 stars 2 forks source link

Can I use RAPIDS for faster descriptor calculation? #1

Open iwwwish opened 1 year ago

iwwwish commented 1 year ago

Hi Pat,

Thank you for sharing about NVIDIA RAPIDS with some use cases to leverage GPU computing power.

I was wondering if cudf can be used for faster calculation of RDKit descriptors (or obtain predictions using an ML model) given a data frame consisting a few millions of SMILES. I see that the RDKit KNIME node does this by utilizing all cores in a parallel fashion. I was keen if RAPIDS can be useful here (and one can avoid using multiprocessing in Python). I tried to write a small piece of code to see if it works but got an error which I interpret that apply_rows method might not support data frames with string columns.

I shared my code and the error on their GitHub repo but I am curious if you have any thoughts here?

Thank you! Vishal

PatWalters commented 1 year ago

Hi Vishal,

Apologies for the late reply. Unfortunately, the functionality in RAPIDS is limited and won't enable you to calculate descriptors. You may want to take a look at a blog post I wrote on using Dask for parallel cheminformatics.

https://patwalters.github.io/practicalcheminformatics/jupyter/dask/parallel/2021/03/28/dask-cheminformatics.html

Best,

Pat

On Fri, Dec 23, 2022 at 3:54 PM Vishal Siramshetty @.***> wrote:

Hi Pat,

Thank you for sharing about NVIDIA RAPIDS https://practicalcheminformatics.blogspot.com/2020/06/wicked-fast-cheminformatics-with-nvidia.html with some use cases to leverage GPU computing power.

I was wondering if cudf can be used for faster calculation of RDKit descriptors (or obtain predictions using an ML model) given a data frame consisting a few millions of SMILES. I see that the RDKit KNIME node does this by utilizing all cores in a parallel fashion. I was keen if RAPIDS can be useful here (and one can avoid using multiprocessing in Python). I tried to write a small piece of code to see if it works but got an error which I interpret that apply_rows method might not support data frames with string columns.

I shared my code and the error https://github.com/rapidsai/cudf/issues/12430 on their GitHub repo but I am curious if you have any thoughts here?

Thank you! Vishal

— Reply to this email directly, view it on GitHub https://github.com/PatWalters/rapids_cheminformatics/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVCVTNAYPYCJDNH4JWS5RTWOYGQRANCNFSM6AAAAAATIBVG5Y . You are receiving this because you are subscribed to this thread.Message ID: @.***>

iwwwish commented 1 year ago

Hi Pat,

Thank you for pointing me to the post. I think Dask would be useful for this task. I'll go ahead and give it a try.

Best, Vishal