Open iwwwish opened 1 year ago
Hi Vishal,
Apologies for the late reply. Unfortunately, the functionality in RAPIDS is limited and won't enable you to calculate descriptors. You may want to take a look at a blog post I wrote on using Dask for parallel cheminformatics.
Best,
Pat
On Fri, Dec 23, 2022 at 3:54 PM Vishal Siramshetty @.***> wrote:
Hi Pat,
Thank you for sharing about NVIDIA RAPIDS https://practicalcheminformatics.blogspot.com/2020/06/wicked-fast-cheminformatics-with-nvidia.html with some use cases to leverage GPU computing power.
I was wondering if cudf can be used for faster calculation of RDKit descriptors (or obtain predictions using an ML model) given a data frame consisting a few millions of SMILES. I see that the RDKit KNIME node does this by utilizing all cores in a parallel fashion. I was keen if RAPIDS can be useful here (and one can avoid using multiprocessing in Python). I tried to write a small piece of code to see if it works but got an error which I interpret that apply_rows method might not support data frames with string columns.
I shared my code and the error https://github.com/rapidsai/cudf/issues/12430 on their GitHub repo but I am curious if you have any thoughts here?
Thank you! Vishal
— Reply to this email directly, view it on GitHub https://github.com/PatWalters/rapids_cheminformatics/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVCVTNAYPYCJDNH4JWS5RTWOYGQRANCNFSM6AAAAAATIBVG5Y . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi Pat,
Thank you for pointing me to the post. I think Dask would be useful for this task. I'll go ahead and give it a try.
Best, Vishal
Hi Pat,
Thank you for sharing about NVIDIA RAPIDS with some use cases to leverage GPU computing power.
I was wondering if
cudf
can be used for faster calculation of RDKit descriptors (or obtain predictions using an ML model) given a data frame consisting a few millions of SMILES. I see that the RDKit KNIME node does this by utilizing all cores in a parallel fashion. I was keen if RAPIDS can be useful here (and one can avoid usingmultiprocessing
in Python). I tried to write a small piece of code to see if it works but got an error which I interpret thatapply_rows
method might not support data frames with string columns.I shared my code and the error on their GitHub repo but I am curious if you have any thoughts here?
Thank you! Vishal