bowman-lab / enspara

Modeling molecular ensembles with scalable data structures and parallel computing
https://enspara.readthedocs.io
GNU General Public License v3.0
33 stars 16 forks source link

No topology if different trajectories loaded #198

Closed PeptideSimulator01 closed 4 years ago

PeptideSimulator01 commented 4 years ago

Dear enspara Team,

I really enjoy enspara and hope, that you can help me with my issue.

I try to clusters 2 trajectories from 2 different peptides (both with 14 aa). I only wright the C, CA and N atoms in the trajectory to be able to clusters them together. For sure, I also only cluster based on RMSD of C, CA and N. The topology pdb also only includes the backbone atoms C, CA, N. After the clustering I want to count which cluster was used how many times by peptide 1 or 2 and hopefully see a different distribution.

python cluster.py \
  --trajectories peptide1.h5 peptide2.h5\ 
  --topology peptide_backbone.pdb \
  --algorithm khybrid \
  --cluster-number 5 \
  --atoms '(name N or name C or name CA)' \
  --distances /home/psc/enspara/enspara/apps/fs-khybrid-clusters0020-distances.h5 \
  --center-features /home/psc/enspara/enspara/apps/fs-khybrid-clusters0020-centers.pickle \
  --assignments /home/psc/enspara/enspara/apps/fs-khybrid-clusters0020-assignments.h5

The clustering works and I get a get distribution for the clusters:

cl_assign=enspara.ra.load('fs-khybrid-clusters0020-assignments.h5')
cl_assign_list=np.ndarray.tolist(cl_assign)
cl_assign_list_flat= [val for sublist in cl_assign_list for val in sublist]

centroid1=cl_assign_list_flat.count(0)
centroid2=cl_assign_list_flat.count(1)
centroid3=cl_assign_list_flat.count(2)
centroid4=cl_assign_list_flat.count(3)
centroid5=cl_assign_list_flat.count(4)
print('centroid1:',centroid1)
print('centroid2:',centroid2)
print('centroid3:',centroid3)
print('centroid4:',centroid4)
print('centroid5:',centroid5)

output:

centroid1: 2021
centroid2: 2152
centroid3: 3663
centroid4: 3622
centroid5: 2542

But when I try to generate a pdb of the cluster centers, i get the error:

ValueError: The topologies of the Trajectories are not the same

The code I used:

with open('fs-khybrid-clusters0020-centers.pickle', 'rb') as f:
    ctr_structs = md.join(pickle.load(f))
traj=ctr_structs.superpose(ctr_structs,frame=0,parallel=True)
traj.save_pdb('cluster_center.pdb')

If take the route via:

python cluster.py \
  --trajectories peptide1.h5\
  --topology peptide1.pdb \
  --trajectories peptide2.h5\
  --topology peptide2.pdb \
  --atoms '(name CA or name C or name N)' \
  --algorithm khybrid \
  --cluster-number 5 \
  --distances /home/psc/enspara/enspara/apps/fs-khybrid-clusters0020-distances.h5 \
  --center-features /home/psc/enspara/enspara/apps/fs-khybrid-clusters0020-centers.pickle \
  --assignments /home/psc/enspara/enspara/apps/fs-khybrid-clusters0020-assignments.h5

and then go:

with open('fs-khybrid-clusters0020-centers.pickle', 'rb') as f:
    ctr_structs = md.join(pickle.load(f))

I get:

ValueError: Number of atoms in self (239) is not equal to number of atoms in other

I think I have logic error, why this is not working. Any help is appreciated. Thanks for your effort.

PeptideSimulator01 commented 4 years ago

I got the centroids via:

infile=open('fs-khybrid-clusters0020-centers.pickle', 'rb')
new_file=pickle.load(infile)
first_tr=new_file[0]
first_tr.save_pdb('1.pdb')

continue like this for the other centroids.