Open twall opened 3 years ago
Attempting to create a db like thus (the fingerprint length is about 1600):
inchi = ...
ext_fp = [...]
db = bingo.Bingo.createDatabaseFile("foo")
indigo = Indigo()
mol = indigo.loadMolecule(inchi)
sim_fp = indigo.loadFingerprintFromBuffer(ext_fp)
db.insertWithExtFP(mol, sim_fp)
Results in the following error:
indigo.bingo.BingoException: 'BaseSimilarityMatcher: external fingerprint is incompatible with current database'
Hi Timothy, Look please into few tests with external fingerprints functionality (in attachment) I hope it helps you with usage such fingerprints for your tasks.
fp-from-descriptors.py.txt bingo_settings.py.txt ext_fp.py.txt
Be sure please that fingerprints settings should be the same for indigo and bingo instances.
Best Regards! Iurii
Hi @IuriiPuzanov ,
Thanks for the information. I have a few questions.
when using "loadFingerprintFromDescriptors", you use a byte count of "64", with descriptors of length 115. I would infer that the fingerprint byte size would be the descriptor length divided by 8 (one bit per descriptor), but that doesn't seem to be the case here. While 64 bytes will certainly hold more than 115 bits, I have problems setting the fp byte count value.
What are the bytes supposed to be when using "loadFingerprintFromBuffer"? Same as descriptors except in the range 0-255? or simply an array of bytes representing values of zero or one?
Do the descriptors need to be normalized between zero and one? I see some of the descriptors in the example are outside that range.
Hi Timothy,
In the test the usual fingerprint size is used and in this case the only requirement is that fingerprint size should be the same for all molecules in the database. LoadFingerprintFromBuffer uses the buffer with fingerprint itself so this an array bits packed into bytes. And it would be better to normalize the descriptors between 0.0 and 1.0 to provide more predictable results, the test uses values outside this range only for checking robustness of algorithm in special cases.
Best Regards! Iurii
@IuriiPuzanov
Thank you for your response.
fp_density
function in fp-from-descriptors.py.txt
is not actually used (and it's doing a string compare against '1'). What would it look like if it were actually being applied to the descriptors used in the function (your density is hard-coded there to "0.3")?fp_ext.txt
, it appears the sim_qwords (8) is the number of bytes in the buffer (64) divided by 8, and according to the docs, the fingerprint size (in bytes) would then be "64". Is this the proper relation between sim_qwords and fingerprint size?Hi Timothy,
Actually fp_density is used just for output actual density values for generated fingerprints. Using this parameter with descriptors depends on actual descriptors values and desirable sensitivity. In most cases the value 0.5 is acceptable. As about relation between sim_qwords and fingerprint size you are absolutely right, and fingerprint size has no direct dependency on the length of descriptors array. In any case all descriptors will be packed or scattered through available fingerprint bits but fingerprint size should be large enough for desirable sensitivity.
Best Regards! Iurii
Finally getting back to this, it seems the ability to successfully invoke db.insertWithExtFP()
depends on the fingerprint size in bytes; a value of 64 works, but other values produce an error indigo.bingo.bingo_exception.BingoException: insert fail: external fingerprint is incompatible with current database
. What are the constraints on the fingerprint size in bytes? Is this simply specifying how big a structure to use within the database, and has nothing to do with the length of the fingerprint descriptors?
I'm using rdkit to generate MACCs fingerprints like the following:
from rdkit import Chem
from rdkit.Chem import MACCSkeys
rmol = Chem.MolFromSmiles(mol.smiles())
maccs_fp = MACCSkeys.GenMACCSKeys(rmol)
bs = list(int(s) for s in maccs_fp.ToBitString())
fp = indigo.loadFingerprintFromDescriptors(bs, 64, 0.5)
maccs_fp.ToBitString()
generates an array of zeroes and ones, so while it seems to work, it seems to be less information than loadFingerprintFromDescriptors
expects (array of floats between zero and one).
Is this the correct way to load an external fingerprint, or am I missing something?
Hi Timothy,
It looks like you need just convert the ints into floats and provide this array as input into loadFingerprintFromDescriptors.
Best Regards! Iurii
Indigo.loadFingerprintFromDescriptors()
andIndigo.loadFingerprintFromBuffer()
exist, as doesBingo.insertWithExtFP()
, but the documentation on how to use these to implement a bingo DB with external fingerprints is lacking.I have some vectors of measurements thatt I'd like to convert into fingerprints in order to perform similarity lookup using bingo, but I can't determine exactly how (even after perusing the source for a while).
What should be where the question marks are? What's the reasoning/process in converting an array of (normalized) floats into a vector of bits?