EricBoittier / GlycoTorch-Vina

2 stars 2 forks source link

Quick quesh regarding GlycoTorch-Online / Carbohydrate to PDBQT #1

Open pguillem opened 4 weeks ago

pguillem commented 4 weeks ago

Hey Eric! Pedro and Gloria here, from TU-Dresden next door :-)

We were hoping to give GlycoTorch a quick try with some glycans, and we realized the file Carbohydrate_to_PDBQT.py is no longer available in your repo.

I was wondering if you could share a copy, if you still happen to have it around?. We are using the latest ADT (2022) to build the inputs, I don't know if it's still necesary.

Thank you so much in advance! Pedro!

EricBoittier commented 3 weeks ago

Hi Pedro and Gloria, thanks for getting in touch. The file you are looking for is here: https://github.com/EricBoittier/GlycoTorch_Online/blob/master/glycotorch/Carbohydrate_to_PDBQT.py I remember this script as probably the worse code I have ever written! It is quite hacky and I can't promise it will work with every carbohydrate residue, and it is designed specifically for AutoDock Vina and its derivatives (vinacarb, gtv). Depending on which docking engine you want to use, you might find you get better results with ADT. Back in the day, I found ADT wasn't so good at finding the correct rotatable groups or atom types for glycans (in the context of Vina and the atom typing scheme it uses for polar hydrogens). Let me know if you run in to any issues.

pguillem commented 3 weeks ago

Hi Eric.

We are trying it out with our own GAGs, which are built using our own saccharide unit libraries.

We noticed that there is a step where "assign_ring_positions()" is called, and it loops through atoms in rings to set whether each is C1, C2, C3, C4, C5 or O5. You did a collection of functions called isC1, isC2, etc in Carbohydrate.py, to check each case.

Usually the atom position is indicated within the PDB field itself (i.e: C3) (74, 'ATOM 74 C3 XXX 3 9.796 36.678 11.583 0.00 0.00 C ')

However, the function (isC3, for example) tries to determine the C position by checking neighbors, instead of just taking C3 from the PDB file.

   def isC3(self, atomID):
        # CALLS C5 and C2
        if self.atoms[atomID].isAtom("C")  and self.isAtomInRing(atomID)  and not self.isC5(atomID):
            for connections in self.connections[atomID]:
                if self.isC2(connections):
                    return True
        return False

We noticed that when we run the program on a GAG PDB, this error pops up:

Something went wrong with assigning ring positions for (74) in ring [68, 74, 77, 80, 81, 65], it was not able to be assigned to a ring position.

Note the array shows position 65 (C1) at the end, and position 68 (C2) at the beginning. Is the order of elements in the array relevant for working with your algorithm?. For reference, 81 is the O5 atom. Would it help to re-order the array?

I still traced back the error, and the first position (68) of the array (which is a C2) is assigned C5 for some reason. (68, 'ATOM 68 C2 XXX 3 9.964 37.702 10.473 0.00 0.00 C')

Then, position 74, which is a C3, is not assigned anything and, it fails with the error. (74, 'ATOM 74 C3 XXX 3 9.796 36.678 11.583 0.00 0.00 C')

I can make the program assume the ring position from the PDB atom position field. Would this be coherent with the rest of your workflow?. How could we solve this to avoid problems, so that the CHI energy functions later are properly applied?

I also fixed some parsing exceptions in Geometry.py, when the element in the last column in the PDB file has more than 1 letter (usually happens when created from .mol2 files). I will take note on the corrections and gladly share them :+1:

Thanks in advance! Pedro

EricBoittier commented 3 weeks ago

Hi Pedro, yes that routine is extremely cursed and was only used as a work around for older PDB files where the ring atom names did not match the Glycam style... The code in Vina is just checking for the atom names to assign the CP parameters to work out the ring conformation. The glycosidic oxygen also needs the same name as in Glycam iirc. I agree it is better to assign the ring positions from the atom names, and if that fails run the previously mentioned cursed algorithm. If you want to make a pull request feel free, otherwise I can modify the code as you suggested. Please let me know. Cheers, Eric

EricBoittier commented 3 weeks ago

also thanks, please share any corrections or make a pull request as you see fit 😊 Do you happen to know if the issue is also the same for molecules generated by the glycam server?

pguillem commented 3 weeks ago

Thanks Eric!!!

Super, thanks for those pointers. Ouff..I hear you man... keeping up with everybody messing up the PDB "standard" however they want for 35 years is not easy.

Doing a distance matrix with vdw radii was a great idea btw... nevermind the cursed code hahah.. you are essentially the only scientist on the planet who figured how to compute these pdbqt files properly, so kudos to you!.

So I modified Atom.py and added variable "atom_nomenclature" to hold column 14-16 of the pdb line. Then I used that value to map the appropriate CX and O5 in Carbohydrate.py 👍 . Worked out of the box!. I'll commit it.

I'll check with the Glycam server and see if it happens with those too.

pguillem commented 3 weeks ago

Here is another interesting one: Our GAG has 3 more rings that are not sugars. One of them is a cyclopentane.

Offcourse, those also enter the "assign_ring_positions()" function, which is taylored for mapping sugar only. So I'm currently escaping the function when it finds a non-sugar ring (less than 6 atoms, or not having an O5).

Will the rest of the code treat those non-sugar atoms properly?

Best! Pedro

EricBoittier commented 3 weeks ago

Thanks a lot! Looking forward to merging the commit when it is ready.

Yes, that is an interesting one. I had a look through the code and I think it would take some effort to get it working for cyclopentane, but if you were able to leave the O5 variable as false, skip the assignment of CP parameters, etc. in Ring.py, then the Carbohydrate_to_PDBQT,py script may be able to process the file into a valid PDBQT file, assuming there are no other functional groups in the ring. If you send me an example pdb file I can give it a try.