ecrl / padelpy

A Python wrapper for PaDEL-Descriptor software
MIT License
183 stars 35 forks source link

Inconsistent descriptors values for same SMILES. #29

Open dinabandhu50 opened 3 years ago

dinabandhu50 commented 3 years ago

Hi I am using this library for my projects and found out that there are some descriptors which will give different values for different run.

In the below figures the x-axis is different SMILES samples i.e. total of 128 samples, and the y-axis is the values calculated by padel-descriptor for topoRadius, topoDiameter and WPATH. Unfortunately because of LICENSE issues I cannot post here the dataset or any SMILES values for reproducibility but.

run 1 run_1

Here we can see the high-values are occurring at - 12, 28, 32, 37 e.t.c

run2 run_2

But in the second run the high values are at - 13, 25, 28, 33 e.t.c which is saying that different values for same set of SMILES.

  1. Is this a common problem ?
  2. How to handle this problem ?
  3. Why would padel descriptor give such extreme values ?

Thanks

thegodone commented 3 years ago

if it's 3D descriptors you may expect those kind of issues. can you reproduce the issue with classical/free smiles ?

tjkessler commented 2 years ago

@dinabandhu50 still having issues with this? The underlying descriptor calculations are a bit outside my wheelhouse...

dinabandhu50 commented 2 years ago

@tjkessler I was using multiple threads to calculate padel-descriptors, using from padelpy import padeldescriptor, which was giving me above issue, later I used single thread and it solved the problem.

But actually padelpy also gives output with corresponding SMILES name when done multithreaded way. so one can also take advantage of that.

Again thanks a lot for the software,

Zuosq commented 1 year ago

![Uploading image.png…]()Hi,there are other types od Fingerprint, How do I get them? Thanks