Acellera / htmd

HTMD: Programming Environment for Molecular Discovery
https://software.acellera.com/docs/latest/htmd/index.html
Other
253 stars 58 forks source link

IndexError: arrays used as indices must be of integer (or boolean) type #1079

Closed vas2201 closed 4 months ago

vas2201 commented 4 months ago

Hello,

I encountered an error while running the code provided below. Could you please advise on how to resolve this issue?

mol = Molecule('2LEB')

mol.filter("protein")

mol.set('segid', 'A', sel='protein')

mol_op = systemPrepare(mol) mol_seg = autoSegment(mol_op) mol_solv = solvate (mol_seg, pad=10) mol_amber = amber.build(mol_solv, saltconc = 0.15, outdir='build-amber')

---- Molecule chain report ---- Chain A: First residue: MET 1
Final residue: SER 101
Chain B: First residue: U 102
Final residue: U 107
---- End of chain report ----

2024-02-26 01:48:02,253 - moleculekit.tools.preparation - INFO - Modified residue HIS 63 A to HIE 2024-02-26 01:48:02,254 - moleculekit.tools.preparation - INFO - Modified residue HIS 99 A to HIE 2024-02-26 01:48:02,254 - moleculekit.tools.preparation - INFO - Modified residue HIS 100 A to HIE 2024-02-26 01:48:02,254 - moleculekit.tools.preparation - INFO - Modified residue U 102 B to RU5 2024-02-26 01:48:02,255 - moleculekit.tools.preparation - INFO - Modified residue C 103 B to RC 2024-02-26 01:48:02,255 - moleculekit.tools.preparation - INFO - Modified residue C 104 B to RC 2024-02-26 01:48:02,256 - moleculekit.tools.preparation - INFO - Modified residue A 105 B to RA 2024-02-26 01:48:02,256 - moleculekit.tools.preparation - INFO - Modified residue G 106 B to RG 2024-02-26 01:48:02,257 - moleculekit.tools.preparation - INFO - Modified residue U 107 B to RU3 2024-02-26 01:48:02,269 - htmd.builder.solvate - INFO - Using water pdb file at: /home/nmrbox/spenumutchu/anaconda3/envs/htmd2/lib/python3.10/site-packages/htmd/share/solvate/wat.pdb 2024-02-26 01:48:02,850 - htmd.builder.solvate - INFO - Replicating 4 water segments, 2 by 1 by 2 Solvating: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00, 3.06it/s] 2024-02-26 01:48:04,163 - htmd.builder.solvate - INFO - 5427 water molecules were added to the system. 2024-02-26 01:48:04,442 - htmd.builder.builder - WARNING - Segments ['P0'] contain both protein and non-protein atoms. Please assign separate segments to them or the build procedure might fail. 2024-02-26 01:48:04,942 - py.warnings - WARNING - /home/nmrbox/spenumutchu/anaconda3/envs/htmd2/lib/python3.10/site-packages/moleculekit/align.py:52: RuntimeWarning: Mean of empty slice. centroidP = P.mean(axis=0)

2024-02-26 01:48:04,942 - py.warnings - WARNING - /home/nmrbox/spenumutchu/anaconda3/envs/htmd2/lib/python3.10/site-packages/numpy/core/_methods.py:184: RuntimeWarning: invalid value encountered in divide ret = um.true_divide(

2024-02-26 01:48:04,943 - py.warnings - WARNING - /home/nmrbox/spenumutchu/anaconda3/envs/htmd2/lib/python3.10/site-packages/moleculekit/align.py:53: RuntimeWarning: Mean of empty slice. centroidQ = Q.mean(axis=0)

2024-02-26 01:48:04,943 - py.warnings - WARNING - /home/nmrbox/spenumutchu/anaconda3/envs/htmd2/lib/python3.10/site-packages/moleculekit/align.py:27: RuntimeWarning: invalid value encountered in scalar divide RMSD = np.sqrt(np.abs(RMSD / P.shape[0]))


IndexError Traceback (most recent call last) Cell In[35], line 7 5 mol_seg = autoSegment(mol_op) 6 mol_solv = solvate (mol_seg, pad=10) ----> 7 mol_amber = amber.build(mol_solv, saltconc = 0.15, outdir='build-amber')

File ~/anaconda3/envs/htmd2/lib/python3.10/site-packages/htmd/builder/amber.py:703, in build(mol, ff, topo, param, prefix, outdir, caps, ionize, saltconc, saltanion, saltcation, disulfide, teleap, teleapimports, execute, atomtypes, offlibraries, gbsa, igb) 697 if all(ff19tip3p): 698 logger.warning( 699 "CAUTION: AMBER Forcefield ff19SB is NOT compatible with TIP3P water model." 700 " Consider using ff14SB instead or using the OPC water model." 701 ) --> 703 disulfide = _prepareMolecule(mol, caps, disulfide) 704 _detect_cofactors_ncaa_ptm(mol, param, topo) 706 if ionize:

File ~/anaconda3/envs/htmd2/lib/python3.10/site-packages/htmd/builder/amber.py:523, in _prepareMolecule(mol, caps, disulfide) 520 logger.info(f"Found cyclic segment {cc}. Disabling capping on it.") 521 del caps[cc] --> 523 _add_caps(mol, caps) 525 # Before modifying the resids copy the molecule to map back the disulfide bonds 526 mol_orig_resid = mol.copy()

File ~/anaconda3/envs/htmd2/lib/python3.10/site-packages/htmd/builder/amber.py:1099, in _add_caps(mol, caps) 1097 capmol.resid[:] += terminals[term]["resid"] 1098 capmol.chain[:] = mol.chain[mask][0] -> 1099 capmol.remove(cap_idx, _logger=False) # Remove atoms which were aligned 1101 # Remove the atoms defined in remove_atoms to clean up the termini 1102 removed = mol.remove( 1103 np.isin(mol.name, remove_atoms[cap]) & mask, _logger=False 1104 )

File ~/anaconda3/envs/htmd2/lib/python3.10/site-packages/moleculekit/molecule.py:608, in Molecule.remove(self, selection, _logger) 606 self._updateBondsAnglesDihedrals(sel) 607 for k in self._atom_and_coord_fields: --> 608 self.dict[k] = np.delete(self.dict[k], sel, axis=0) 609 if k == "coords": 610 self.dict[k] = np.atleast_3d(self.dict[k])

File <__array_function__ internals>:200, in delete(*args, **kwargs)

File ~/anaconda3/envs/htmd2/lib/python3.10/site-packages/numpy/lib/function_base.py:5235, in delete(arr, obj, axis) 5233 else: 5234 keep = ones(N, dtype=bool) -> 5235 keep[obj,] = False 5237 slobj[axis] = keep 5238 new = arr[tuple(slobj)]

IndexError: arrays used as indices must be of integer (or boolean) type

stefdoerr commented 4 months ago

Hi, and thanks for finding this bug! Once this job here completes https://github.com/Acellera/moleculekit/actions/runs/8045964937 there will be a new moleculekit release available, version 1.8.22. Upgrade to that one and it will work (sort of because it will throw a new error about a unknown residue called RU3 in your input structure). The RNA will require some digging around to find parameters/topologies for it's residues.

stefdoerr commented 4 months ago

Hit an issue. Going for 1.8.24 now https://github.com/Acellera/moleculekit/actions/runs/8046401654

vas2201 commented 4 months ago

updated the moleculekit as you requested. However, It is still have issue, the input file doesn't have RU3, it is terminal residue as U and end up with END, ENDMDL. if possible, Could you please test "2LEB" from pdb database?

---- Molecule chain report ---- Chain A: First residue: MET 1
Final residue: SER 101
Chain B: First residue: U 102
Final residue: RU3 107
---- End of chain report ----

2024-02-26 15:52:39,149 - moleculekit.tools.preparation - INFO - Modified residue HIS 63 A to HIE 2024-02-26 15:52:39,149 - moleculekit.tools.preparation - INFO - Modified residue HIS 99 A to HIE 2024-02-26 15:52:39,150 - moleculekit.tools.preparation - INFO - Modified residue HIS 100 A to HIE 2024-02-26 15:52:39,150 - moleculekit.tools.preparation - INFO - Modified residue U 102 B to RU5 2024-02-26 15:52:39,151 - moleculekit.tools.preparation - INFO - Modified residue C 103 B to RC 2024-02-26 15:52:39,151 - moleculekit.tools.preparation - INFO - Modified residue C 104 B to RC 2024-02-26 15:52:39,151 - moleculekit.tools.preparation - INFO - Modified residue A 105 B to RA 2024-02-26 15:52:39,152 - moleculekit.tools.preparation - INFO - Modified residue G 106 B to RG 2024-02-26 15:52:39,168 - htmd.builder.solvate - INFO - Using water pdb file at: /home/nmrbox/spenumutchu/anaconda3/envs/htmd2/lib/python3.10/site-packages/htmd/share/solvate/wat.pdb 2024-02-26 15:52:39,553 - htmd.builder.solvate - INFO - Replicating 4 water segments, 2 by 1 by 2 Solvating: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00, 2.69it/s] 2024-02-26 15:52:41,046 - htmd.builder.solvate - INFO - 5427 water molecules were added to the system. 2024-02-26 15:52:41,204 - htmd.builder.builder - WARNING - Segments ['P0'] contain both protein and non-protein atoms. Please assign separate segments to them or the build procedure might fail. 2024-02-26 15:52:41,764 - py.warnings - WARNING - /home/nmrbox/spenumutchu/anaconda3/envs/htmd2/lib/python3.10/site-packages/moleculekit/align.py:52: RuntimeWarning: Mean of empty slice. centroidP = P.mean(axis=0)

2024-02-26 15:52:41,765 - py.warnings - WARNING - /home/nmrbox/spenumutchu/anaconda3/envs/htmd2/lib/python3.10/site-packages/numpy/core/_methods.py:184: RuntimeWarning: invalid value encountered in divide ret = um.true_divide(

2024-02-26 15:52:41,765 - py.warnings - WARNING - /home/nmrbox/spenumutchu/anaconda3/envs/htmd2/lib/python3.10/site-packages/moleculekit/align.py:53: RuntimeWarning: Mean of empty slice. centroidQ = Q.mean(axis=0)

2024-02-26 15:52:41,766 - py.warnings - WARNING - /home/nmrbox/spenumutchu/anaconda3/envs/htmd2/lib/python3.10/site-packages/moleculekit/align.py:27: RuntimeWarning: invalid value encountered in scalar divide RMSD = np.sqrt(np.abs(RMSD / P.shape[0]))

2024-02-26 15:52:41,814 - py.warnings - WARNING - /home/nmrbox/spenumutchu/anaconda3/envs/htmd2/lib/python3.10/site-packages/moleculekit/bondguesser.py:218: RuntimeWarning: invalid value encountered in cast xyz_boxes = np.floor(xyzrange / pairdist).astype(np.uint32) + 1

2024-02-26 15:52:41,815 - py.warnings - WARNING - /home/nmrbox/spenumutchu/anaconda3/envs/htmd2/lib/python3.10/site-packages/moleculekit/bondguesser.py:235: RuntimeWarning: invalid value encountered in cast box_idx = np.floor((coords - min_c) / pairdist).astype(int)

2024-02-26 15:52:43,268 - htmd.builder.amber - INFO - Detecting disulfide bonds. 2024-02-26 15:52:44,372 - htmd.builder.amber - INFO - Starting the build.


BuildError Traceback (most recent call last) Cell In[12], line 7 5 mol_seg = autoSegment(mol_op) 6 mol_solv = solvate (mol_seg, pad=10) ----> 7 mol_amber = amber.build(mol_solv, saltconc = 0.15, outdir='build-amber')

File ~/anaconda3/envs/htmd2/lib/python3.10/site-packages/htmd/builder/amber.py:707, in build(mol, ff, topo, param, prefix, outdir, caps, ionize, saltconc, saltanion, saltcation, disulfide, teleap, teleapimports, execute, atomtypes, offlibraries, gbsa, igb) 704 _detect_cofactors_ncaa_ptm(mol, param, topo) 706 if ionize: --> 707 molbuilt = _build( 708 mol, 709 ff=ff, 710 topo=topo, 711 param=param, 712 prefix=prefix, 713 outdir=outdir, 714 disulfide=disulfide, 715 teleap=teleap, 716 teleapimports=teleapimports, 717 execute=execute, 718 atomtypes=atomtypes, 719 offlibraries=offlibraries, 720 gbsa=gbsa, 721 igb=igb, 722 ) 723 shutil.move( 724 os.path.join(outdir, "structure.crd"), 725 os.path.join(outdir, "structure.noions.crd"), 726 ) 727 shutil.move( 728 os.path.join(outdir, "structure.prmtop"), 729 os.path.join(outdir, "structure.noions.prmtop"), 730 )

File ~/anaconda3/envs/htmd2/lib/python3.10/site-packages/htmd/builder/amber.py:956, in _build(mol, ff, topo, param, prefix, outdir, disulfide, teleap, teleapimports, execute, atomtypes, offlibraries, gbsa, igb) 954 os.chdir(currdir) 955 if errors: --> 956 raise BuildError( 957 errors 958 + [f"Check {logpath} for further information on errors in building."], 959 errors, 960 ) 961 logger.info("Finished building.") 963 if ( 964 os.path.exists(os.path.join(outdir, "structure.crd")) 965 and os.path.getsize(os.path.join(outdir, "structure.crd")) != 0 966 and os.path.getsize(os.path.join(outdir, "structure.prmtop")) != 0 967 ):

BuildError: MissingResidueError("Unknown residue(s) ['RU3'] found in the input structure. You are either missing a topology definition for the residue or you need to rename it to the correct residue name", array(['RU3'], dtype='<U3')) MissingAtomTypeError('Missing atom type for [\'.R<RU3 108>.A<P 1>\', \'.R<RU3 108>.A<O1P 2>\', \'.R<RU3 108>.A<O2P 3>\', ".R<RU3 108>.A<O5\' 4>", ".R<RU3 108>.A<C5\' 5>", ".R<RU3 108>.A<C4\' 6>", ".R<RU3 108>.A<O4\' 7>", ".R<RU3 108>.A<C3\' 8>", ".R<RU3 108>.A<O3\' 9>", ".R<RU3 108>.A<C2\' 10>", ".R<RU3 108>.A<O2\' 11>", ".R<RU3 108>.A<C1\' 12>", \'.R<RU3 108>.A<N1 13>\', \'.R<RU3 108>.A<C2 14>\', \'.R<RU3 108>.A<O2 15>\', \'.R<RU3 108>.A<N3 16>\', \'.R<RU3 108>.A<C4 17>\', \'.R<RU3 108>.A<O4 18>\', \'.R<RU3 108>.A<C5 19>\', \'.R<RU3 108>.A<C6 20>\', ".R<RU3 108>.A<H5\'1 21>", ".R<RU3 108>.A<H5\'2 22>", ".R<RU3 108>.A<H4\' 23>", ".R<RU3 108>.A<H1\' 24>", \'.R<RU3 108>.A<H6 25>\', \'.R<RU3 108>.A<H5 26>\', \'.R<RU3 108>.A<H3 27>\', ".R<RU3 108>.A<H3\' 28>", ".R<RU3 108>.A<H2\'1 29>", ".R<RU3 108>.A<HO\'2 30>", \'.R<RU3 108>.A<H3T 31>\', \'.R<RU3 108>.A<N 32>\', \'.R<RU3 108>.A<CA 33>\', \'.R<RU3 108>.A<C 34>\', \'.R<RU3 108>.A<O 35>\']', ['.R<RU3 108>.A<P 1>', '.R<RU3 108>.A<O1P 2>', '.R<RU3 108>.A<O2P 3>', ".R<RU3 108>.A<O5' 4>", ".R<RU3 108>.A<C5' 5>", ".R<RU3 108>.A<C4' 6>", ".R<RU3 108>.A<O4' 7>", ".R<RU3 108>.A<C3' 8>", ".R<RU3 108>.A<O3' 9>", ".R<RU3 108>.A<C2' 10>", ".R<RU3 108>.A<O2' 11>", ".R<RU3 108>.A<C1' 12>", '.R<RU3 108>.A<N1 13>', '.R<RU3 108>.A<C2 14>', '.R<RU3 108>.A<O2 15>', '.R<RU3 108>.A<N3 16>', '.R<RU3 108>.A<C4 17>', '.R<RU3 108>.A<O4 18>', '.R<RU3 108>.A<C5 19>', '.R<RU3 108>.A<C6 20>', ".R<RU3 108>.A<H5'1 21>", ".R<RU3 108>.A<H5'2 22>", ".R<RU3 108>.A<H4' 23>", ".R<RU3 108>.A<H1' 24>", '.R<RU3 108>.A<H6 25>', '.R<RU3 108>.A<H5 26>', '.R<RU3 108>.A<H3 27>', ".R<RU3 108>.A<H3' 28>", ".R<RU3 108>.A<H2'1 29>", ".R<RU3 108>.A<HO'2 30>", '.R<RU3 108>.A<H3T 31>', '.R<RU3 108>.A<N 32>', '.R<RU3 108>.A<CA 33>', '.R<RU3 108>.A<C 34>', '.R<RU3 108>.A<O 35>']) Check /home/nmrbox/spenumutchu/projects/htmd/SRSF2/SRSF2_WT_UCCAGU_1/build-amber/log.txt for further information on errors in building.

stefdoerr commented 4 months ago

The problem is the segmentation. It's messing up the building.

mol = Molecule('2LEB')
mol_op = systemPrepare(mol)
mol_solv = solvate(mol_op, pad=10)
mol_amber = amber.build(mol_solv, saltconc = 0.15, outdir='build-amber')

This works fine

vas2201 commented 4 months ago

Dear Stefdoerr and HTMD Team,

I wanted to extend my sincere gratitude for your invaluable assistance over the past few years. Your expertise and support have been instrumental in our research endeavors.

I am pleased to inform you that we have recently published a paper in Science Advances, where we utilized and cited HTMD as a crucial component of our study on viral mRNA components. You can find the publication at the following link: [Paper Link] (https://www.science.org/doi/full/10.1126/sciadv.adg3060).

Once again, thank you for your unwavering support. It has truly made a difference in our work.

Regards

Vas

stefdoerr commented 4 months ago

Congratulations on the publication! Nice to hear that :) Next time cite HTMD paper please https://pubs.acs.org/doi/abs/10.1021/acs.jctc.6b00049 , the paper seems to be missing the citation.