Closed Lili-Cao closed 4 days ago
Hi @Lili-Cao,
The bitvectors as output by fp.to_bitvectors()
are constructed from the dataframe obtained by fp.to_dataframe()
and follow the same order, i.e. index 0 in the bitvector corresponds to the first column in the dataframe.
For comparing interactions between different systems, it depends what you mean by system:
from itertools import chain
lig1_supplier = plf.mol2_supplier("lig1.mol2")
lig2_supplier = plf.mol2_supplier("lig2.mol2")
chained_supplier = chain(lig1_supplier, lig2_supplier)
fp.run_from_iterable(chained_supplier, protein_mol)
concatenate both dataframes
) of the tutorials, and then construct the bitvectors from the merged dataframe with prolif.to_bitvectors(merged_df)
Hi,
Thanks! So I guess for proteins (which might have different name/number of residues), the length of the bit vector is also different?
Best Regards Lili
Yes, you will end up with bitvectors where each bit position corresponds to a very different residue and interaction type (unless you manage to join the dataframes from each system using a common naming scheme for all your proteins as I stated in the previous comment).
Hi, I have a similar problem: I have a multimol2 file with different ligands and I would like to generate a dataframe with the fingerprint of each ligand so I need all the strings to be the same lenght. The following code is what I tried but I obtained the following error:
from itertools import chain
lig1_supplier = plf.mol2_supplier(r"filepath")
chain_supplier = chain()
for lig in lig1_supplier:
chain_supplier = chain(chain_supplier, lig)
fp = plf.Fingerprint()
fp.run_from_iterable(chain_supplier, protein_mol)
AttributeError: 'Residue' object has no attribute 'residues'
while when I generate the fingerprints in the normal way I don't get this error. How can I try in another way? Thank you!
@agerexx I don't understand why you would need to generate the fingerprint another way if the "normal" way already works?
The trick above with the chained supplier is only necessary when you have multiple FILES (containing one or more ligands), if you have a single file containing multiple ligands (which is your case) you don't need this trick: the output dataframe/bitvector are already padded to the same size, i.e. the bit position 5 for the first pose/ligand corresponds to the same residue and interaction type as bit position 5 for the second pose/ligand.
Another way to put it: if you have to call fp.run_from_iterable
multiple times, the output dataframe/bitvector for each call may include different residues and interaction types or even have a different size, and so you can't really compare them out of the box. But if your dataframe/bitvectors were generated after running fp.run_from_iterable
a single time then there's nothing to worry about.
@cbouy So sorry, it was my bad. I was just not converting to a pd dataframe in the proper way as explained in the tutorial. Thank you again for your time!
Hi,
Is the bitvectors from prolif comparable between different systems? Or is the interactions captured in a certain order and so do the bitvectors?
or it is just useful for 1 protein and 1 ligand with different poses?
is it possible to tell the position of the bit represent which interections?
Best Regards Lili