bitvector - Githubissues

Lili-Cao commented 1 week ago

Hi,

Is the bitvectors from prolif comparable between different systems? Or is the interactions captured in a certain order and so do the bitvectors?

or it is just useful for 1 protein and 1 ligand with different poses?

is it possible to tell the position of the bit represent which interections?

Best Regards Lili

cbouy commented 1 week ago

Hi @Lili-Cao,

The bitvectors as output by fp.to_bitvectors() are constructed from the dataframe obtained by fp.to_dataframe() and follow the same order, i.e. index 0 in the bitvector corresponds to the first column in the dataframe.

For comparing interactions between different systems, it depends what you mean by system:

If it's one protein and different ligands input files, then yes. You can just chain the ligand mols like so:

from itertools import chain
lig1_supplier = plf.mol2_supplier("lig1.mol2")
lig2_supplier = plf.mol2_supplier("lig2.mol2")
chained_supplier = chain(lig1_supplier, lig2_supplier)
fp.run_from_iterable(chained_supplier, protein_mol)

If it's different conformations of the exact same protein, then yes. You can generate the dataframe for each system, then follow this part (the second cell that says concatenate both dataframes) of the tutorials, and then construct the bitvectors from the merged dataframe with prolif.to_bitvectors(merged_df)
If it's 2 related proteins, because you can't rely on the residue identifiers when merging the dataframes as above, you'd need to replace the residue identifiers in the dataframe columns with some label that is common to both proteins, e.g. replace each residue identifier with the corresponding index in a sequence alignment.

Lili-Cao commented 1 week ago

Hi,

Thanks! So I guess for proteins (which might have different name/number of residues), the length of the bit vector is also different?

Best Regards Lili

cbouy commented 1 week ago

Yes, you will end up with bitvectors where each bit position corresponds to a very different residue and interaction type (unless you manage to join the dataframes from each system using a common naming scheme for all your proteins as I stated in the previous comment).

agerexx commented 1 week ago

Hi, I have a similar problem: I have a multimol2 file with different ligands and I would like to generate a dataframe with the fingerprint of each ligand so I need all the strings to be the same lenght. The following code is what I tried but I obtained the following error:

from itertools import chain
lig1_supplier = plf.mol2_supplier(r"filepath")
chain_supplier = chain()
for lig in lig1_supplier:
       chain_supplier = chain(chain_supplier, lig)
fp = plf.Fingerprint()
fp.run_from_iterable(chain_supplier, protein_mol)

AttributeError: 'Residue' object has no attribute 'residues'

while when I generate the fingerprints in the normal way I don't get this error. How can I try in another way? Thank you!

cbouy commented 1 week ago

@agerexx I don't understand why you would need to generate the fingerprint another way if the "normal" way already works?

The trick above with the chained supplier is only necessary when you have multiple FILES (containing one or more ligands), if you have a single file containing multiple ligands (which is your case) you don't need this trick: the output dataframe/bitvector are already padded to the same size, i.e. the bit position 5 for the first pose/ligand corresponds to the same residue and interaction type as bit position 5 for the second pose/ligand.

Another way to put it: if you have to call fp.run_from_iterable multiple times, the output dataframe/bitvector for each call may include different residues and interaction types or even have a different size, and so you can't really compare them out of the box. But if your dataframe/bitvectors were generated after running fp.run_from_iterable a single time then there's nothing to worry about.

agerexx commented 5 days ago

@cbouy So sorry, it was my bad. I was just not converting to a pd dataframe in the proper way as explained in the tutorial. Thank you again for your time!

chemosim-lab / ProLIF

bitvector #208