hbond pmap_mpi output format

Llauset commented 11 months ago

Hello,

Version: 2.0.5

Using pt.hbond and pt.distance work really well for my case study and I would like to use the parallel version pmap_mpi to accelerate de calculation.

The parallelization of pt.distance works straight forward. I'm struggling with the parallelization of pt.hbond. The outputting format changes with respect to the sequential version. Sequential: <pytraj.hbonds.DatasetHBond donor_acceptor pairs : 6938> Parallel: [(OrderedDict([('total_solute_hbonds', array([706, 681, 670, ..., 692, 702, 680], dtype=int32)), 'PHE1472_O-LEU1475_N-H', array([1, 0, 0, ..., 1, 0, 0], dtype=int32)) ....

Trying to obtain the same format I added dtype as follows: hb_parallelized = pt.pmap_mpi(pt.hbond, traj, distance=3.0, angle=135, dtype="hbond") And in that case I have the following error: File "//.conda/envs/openmpi_test_3.7/lib/python3.7/site-packages/pytraj/parallel/base.py", line 51, in concat_hbond all_keys.update(partial_data[0].keys()) AttributeError: 'DatasetHBond' object has no attribute 'keys'

I tried to work around the problem directly returning the object data_collection and accessing to it data_collection[0][0].get_amber_mask()[0] but doing that I have not all the hbonds.

Can you please tell me if there is a way to change the parallel outputting format to the sequential one or if exist a parameter to obtain directly the same format in the parallel version than in the sequential.

Thank you in advance for your help.

hainm commented 11 months ago

Can you please tell me if there is a way to change the parallel outputting format to the sequential one or if exist a parameter to obtain directly the same format in the parallel version than in the sequential.

Dear @Llauset, unfortunately there is no way to do any thing you mentioned. But we will keep this in mind, I think it's nice to make it work.

For the information: what kind of information you want from pytraj.hbonds.DatasetHBond?

Llauset commented 11 months ago

Dear Hainm,

Thanks for your responsiveness and your answer. I want to retrieve the list of hydrogen bonds defined by a distance and an angle calculated with pt.hbond and obtained using the amber_mask() function of the pytraj.hbonds.DatasetHBond object. As the trajectory is long I would like to compute ph.hbond in parallel.

If the development of the parallel functionality takes time I would like to try to develop a work around. Is it possible to rebuild this list 'easily' from the object returned by hb_parallelized = pt.pmap_mpi(pt.hbond, traj, distance=3.0, angle=135)? How should one proceed? If this is a functionality that may interest you, once implemented, we could share it with you.

hainm commented 11 months ago

I want to retrieve the list of hydrogen bonds

Dear @Llauset: for the parallel version, the returning data is a dict where the keys are total_solute_hbonds and all the hbonds formed during the simulation.

here is an example

In [1]: import pytraj as pt
tra
In [2]: traj = pt.datafiles.load_trpcage()[:]

In [3]: 

In [3]: d = pt.hbond(traj, dtype='dict')

In [4]: d.keys()
Out[4]: odict_keys(['total_solute_hbonds', 'ASN1_O-GLN5_N-H', 'ARG16_O-TRP6_NE1-HE1', 'TYR3_O-LEU7_N-H', 'ILE4_O-LYS8_N-H', 'LEU7_O-GLY10_N-H', 'ASP9_O-SER14_OG-HG', 'SER14_O-ARG16_N-H', 'ASP9_OD2-ARG16_NH1-HH12', 'ASP9_OD2-ARG16_NH2-HH22', 'LEU2_O-TRP6_N-H', 'GLN5_OE1-LYS8_NZ-HZ1', 'ASN1_O-ILE4_N-H', 'TRP6_O-GLY11_N-H', 'SER20_OXT-SER20_OG-HG', 'ASN1_O-TYR3_N-H', 'GLY11_O-SER14_OG-HG', 'ASP9_OD2-ARG16_NE-HE', 'ASN1_OD1-LEU2_N-H', 'ASP9_OD1-LYS8_NZ-HZ1', 'ASP9_OD2-ARG16_NH2-HH21', 'SER20_O-SER20_OG-HG', 'GLY10_O-SER13_N-H', 'GLY10_O-SER13_OG-HG', 'ASP9_OD1-SER14_OG-HG', 'PRO12_O-GLY15_N-H', 'PRO19_O-SER20_OG-HG', 'GLY11_O-SER14_N-H', 'SER13_O-SER13_OG-HG', 'GLN5_O-ASP9_N-H', 'ASP9_OD2-SER14_OG-HG', 'ASP9_OD2-ARG16_NH1-HH11'])

In [5]: d['ASP9_OD2-ARG16_NE-HE']
Out[5]: array([0, 0, 0, ..., 0, 0, 0], dtype=int32)

d['ASP9_OD2-ARG16_NE-HE'] return an array of int with either 0 or 1 value representing the absence or existence of that spefic hbond for specific frame.

Please let me know if that works for you.

hainm commented 11 months ago

Is it possible to rebuild this list 'easily' from the object returned by hb_parallelized = pt.pmap_mpi(pt.hbond, traj, distance=3.0, angle=135)?

So the question is "yes, it's easy" (d comes from example above)

print(list(set(d) - {"total_solute_hbonds"}))

hainm commented 11 months ago

If this is a functionality that may interest you, once implemented, we could share it with you.

Yes, any contribution to the code is always welcome. Thanks.

Llauset commented 11 months ago

Thank you for your help.

What I did is this function to transform the output of the parallel hbond to the amber_mask and this solve my problem :

def from_hbond_parallel_to_amber_mask(hb_parallelized):
    """
    Convert the keys of hb_parallelized dictionary to amber mask
    :param hb_parallelized: dictionary with the keys of the hydrogen bonds
    :return: list of tuples with the amber mask of the keys
    :rtype: list
    """
    # get all the keys from hb_parallelized dictionary 
    keys = list(hb_parallelized.keys())
    # remove the key 'total_solute_hbonds'
    keys.remove('total_solute_hbonds')
    # change format of keys from HIE4_O-LYS8_NZ-HZ2 to HIE_4@O-LYS_8@NZ-HZ2
    for i in range(len(keys)):
        keys[i] = keys[i].replace("_", " ").replace("-", " ").split()
        # slip the first element after 3 characters
        keys[i][0] = keys[i][0][:3] + '_' + keys[i][0][3:]
        keys[i][2] = keys[i][2][:3] + '_' + keys[i][2][3:]
        acceptor_mask = '@'.join((keys[i][0], keys[i][1]))
        donor_mask = '@'.join((keys[i][2], keys[i][3]))
        keys[i] = '-'.join((acceptor_mask, donor_mask, keys[i][4]))
    # Use function to_amber_mask to convert the keys to amber mask
    amber_masks = list(pt.hbond_analysis.to_amber_mask(keys))
    # split the list of tuples to two independent lists
    distance_masks, angle_masks = list(zip(*amber_masks))
    return distance_masks, angle_masks

hainm commented 11 months ago

thanks @Llauset for the code. Cheers.

Amber-MD / pytraj

hbond pmap_mpi output format #1644