Open sayalaruano opened 3 years ago
I obtained 1412 rows myself as can be seen here: https://github.com/wguesdon/beta-lactamase/blob/main/Data_Wrangling_and_EDA.ipynb. I wonder if we could apply the padelpy method row by row via a lambda function?
I just come up with the solution for this error. The mistake was that I maintain in my dataset some molecules with NaN in canonical smile feature, so padel only calculate fingerprints for molecules above the first NaN. Now, I will try to calculate the 12 fingerprints for all molecules. I hope I can calculate all of them.
Thank you for sharing, it must have been the same issue for me.
You're welcome @wguesdon, this is the good part of these collaborative projects :)
Hello sayalaruano,
I have the same problem. I obtained molecular descriptors of PubChem only 338 molecules although my molecule.smi file has 64424 molecules.
Hello @semsem80 , to solve this error, you need to delete molecules with NaN in canonical_smile feature. In this way, you can solve this problem. Hope this can be helpful, let me know if it works.
Hi @sayalaruano, your suggested solution worked, thank you for your help.
Hello professor, I’m doing EDA and calculation of molecular descriptors of the betalactamase dataset. I replaced duplicated values by the mean of them as you suggested, and filtered only molecules that bind to Betalactamase AmpC, and I have a dataset with 62050 molecules. Then, I followed instructions to calculate molecular descriptors with paDELpy from the video of description, but I obtained molecular descriptors of only 5534 molecules although my molecule.smi file has 62050 molecules. Do you know if there are restrictions regarding the number of molecules for calculating descriptors in paDEL ? or this error can be associated with something from my code ?. This GitHub repo contains my notebook and all files: https://github.com/sayalaruano/MidtermProject-MLZoomCamp. I added the same comment in the youtube video of the challenge, just in case. Thanks in advance for your help.