google-code-export / nmrrestrntsgrid

Automatically exported from code.google.com/p/nmrrestrntsgrid
0 stars 0 forks source link

format converter misaligning mmcif and pdb res numbering #248

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The PDB file's residue numbering for the pdb file goes resid -5, -4, -3, -2, 0 
(-1 is skipped). 
The mmCIF file starts at resid 1. The arithmetic mapping should be straight 
forward - [ ['A', ' ', 1, 
-6], ['A', ' ', 5, -5] ].  2KUA.cif  and the joinedCoord.str file both contain 
a correct mapping 
between the author's nomenclature and the mmcif nomenclature.

However, the atom_site loop in the star file always ends up with this mapping:
        1     1 1 1   7 ALA C    C -1.439   0.219 -14.433 1.00 . A A .   1 MET C    1 1 
        1     2 1 1   7 ALA CA   C -1.952   0.544 -13.033 1.00 . A A .   1 MET CA   1 1 
where 7 ALA should actually map to 2 ALA. This happens regardless of what I put 
into 
forceChainMappings. 

I know the forceChainMapping is being read, because I am able to make the 
distance violations 
worse. I have not succeeded in making them better.

It looks like format converter is throwing out everything from resid -5 to 
resid 0 and then 
misaligning the rest.

I've attached the latest log file.

Original issue reported on code.google.com by schulte....@gmail.com on 8 Apr 2010 at 5:45

Attachments:

GoogleCodeExporter commented 9 years ago
From where can I download the joinedCoord.str file? The most recent location I 
have:

http://sunfish.bmrb.wisc.edu/nmrRestrGrid/

doesn't have the file.

Original comment by wfvran...@gmail.com on 12 Apr 2010 at 3:43

GoogleCodeExporter commented 9 years ago
Sorry. Using http://sunfish.bmrb.wisc.edu/nmrRestrGrid/ was a temporary fix 
until we could get everyone 
access to grunt. It is not actively updated. Here is the file.

Grunt is mounted on tang, and you should be able to log onto that. If not, 
Dimitri should be back this week 
and we can work something out. 

The joingCoord.str files are in 
/grunt/docr/ccpn_tmp/data/archives/bmrb/nmrRestrGrid/

Original comment by schulte....@gmail.com on 12 Apr 2010 at 3:55

Attachments:

GoogleCodeExporter commented 9 years ago
OK thanks. I'd prefer some kind of http access  - makes it easier to 
automatically
download files instead of having to go through ssh.

Original comment by wfvran...@gmail.com on 12 Apr 2010 at 4:01

GoogleCodeExporter commented 9 years ago
I'll see what I can do for a more long term solution.

Original comment by schulte....@gmail.com on 12 Apr 2010 at 4:22

GoogleCodeExporter commented 9 years ago
I've updated http://sunfish.bmrb.wisc.edu/nmrRestrGrid/ and am adding a step in 
the processing procedure to 
keep this up to date.

Original comment by schulte....@gmail.com on 12 Apr 2010 at 6:50

GoogleCodeExporter commented 9 years ago
Thanks Chris!

As for the problem, 't was a complicated one. I was basically always resetting 
the
chain mappings for coordinates, as I'm depending on the CIF info, but it was not
working because of the missing code in this case, so I've made it customisable 
again.

So I've updated the code, you need to check out the CCPN SF code, only 
directory:

python/ccpnmr/format/

Should do the trick

Then I applied (NOTE: use *readProject* instead of readCoordinates!!!):

'2kua': {

  'authors': ['Wim Vranken'],

  'readProject': {

   'keywds': {

     'forceChainMappings': [
                             ['A', 'A', 1, -6],
                             ['A', 'A', 5, -5],
                           ],

      },
    },
  },

And got attached file. Info seems fine in there.

Original comment by wfvran...@gmail.com on 13 Apr 2010 at 10:48

Attachments:

GoogleCodeExporter commented 9 years ago
Thanx Wim.

That seemed to do the trick, but that little trick is going to be tough to work 
into a standard operating 
procedure. Are there any rules of thumb when we should use readProject rather 
than readCoordinates?

Original comment by schulte....@gmail.com on 13 Apr 2010 at 1:47

GoogleCodeExporter commented 9 years ago
Apologies - I was going to elaborate on that but forgot.

In the new setup (i.e. when reading coords from joinedCoord.str), it should 
always be

'readProject'

I don't think that 'readCoordinates' in presetDict.py still works from looking 
at the
code this afternoon, but haven't tried it. I thought it wasn't necessary any 
more...
I can map it through to readProject if it's there, if that makes things easier
(probably does!).

Original comment by wfvran...@gmail.com on 13 Apr 2010 at 3:58

GoogleCodeExporter commented 9 years ago
Apologies - I was going to elaborate on that but forgot.

In the new setup (i.e. when reading coords from joinedCoord.str), it should 
always be

'readProject'

I don't think that 'readCoordinates' in presetDict.py still works from looking 
at the
code this afternoon, but haven't tried it. I thought it wasn't necessary any 
more...
I can map it through to readProject if it's there, if that makes things easier
(probably does!).

Original comment by wfvran...@gmail.com on 13 Apr 2010 at 3:58

GoogleCodeExporter commented 9 years ago
When would you use 'readProject' instead of 'linkResonances'?   

Original comment by schulte....@gmail.com on 13 Apr 2010 at 5:43

GoogleCodeExporter commented 9 years ago
When you're customising something to do with reading in the NMR-STAR file 
itself -
the coordinates are handled on input, the NMR info only later (with 
linkResonances).

Original comment by wfvran...@gmail.com on 14 Apr 2010 at 7:32

GoogleCodeExporter commented 9 years ago
Is that somewhat new? So far, this is the only case of 'readProject' in the 
presetDict.  I've been using 
linkResonances almost exclusively, but I see Aart Nederveen uses linkResonances 
and readCoordinates quite 
a lot. 

There have been other times when the presetDict didn't seem to be enforcing 
what it should (using linkResonances). This might have been the reason. In 
some, maybe most, cases, it wouldn't matter where you 
made the mapping, but sometimes it would matter. Or not.

Original comment by schulte....@gmail.com on 14 Apr 2010 at 1:28

GoogleCodeExporter commented 9 years ago
So readProject is only relevant for the coordinates, not the NMR information
(restraints in this case). Apologies - I only noticed that this was not working
because of the current problem, otherwise I would've let you know earlier.

In any case, shall I adapt the code so that readCoordinates settings are
automatically 'forwarded' to readProject? We probably have to doublecheck the 
entries
in the presetDict.py that have readCoordinates settings in that case.

Original comment by wfvran...@gmail.com on 14 Apr 2010 at 1:32

GoogleCodeExporter commented 9 years ago
I think I see? Since this was an issue between the PDB/restraints mapping and 
the mmCIF file, you need a 
readProject. Where in the ccpn code does this happen?

Original comment by schulte....@gmail.com on 14 Apr 2010 at 1:39

GoogleCodeExporter commented 9 years ago
No this was an issue about the coordinates, nothing to do with the restraints. 
In
CCPN, you have:

1. The definition of the molecular system (residues, atoms). This is based on 
the
sequence in the NMR-STAR file (and happens at the self.readNmrStarFile() stage 
in
linkNmrStarData.py).

2. The coordinates. These are linked to the molecular system when reading the
NMR-STAR file (also happens during self.readNmrStarFile())

3. The restraints (and other NMR info). These are linked to the molecular 
system in a
special procedure - this happens during the self.runLinkResonances() stage of
linkNmrStarData.py)

Hope that clarifies things...

Original comment by wfvran...@gmail.com on 14 Apr 2010 at 1:51

GoogleCodeExporter commented 9 years ago
I think 2rqf is another one with a problem. The mapping seems to work, but our 
distances are way off. I 
checked a big violation in pymol and the author is correct, but we are wrong. 
2kua has this problem also, 
which the author politely pointed out. 

The pdb nomenclature starts with -2,-1,1,2,...  But like I said, the mapping 
seems to be fine, but the 
calculated distances are bad. Should this fork into another issue?

I've updated the joinCoord.str archive on 
http://sunfish.bmrb.wisc.edu/nmrRestrGrid/ 

Original comment by schulte....@gmail.com on 18 May 2010 at 2:33

GoogleCodeExporter commented 9 years ago
I've tried a number of things with 'readProject' but nothing seems to make a 
change - not even in a bad way. I'll 
try again later, when I've had time to think about it a little more. I'll take 
over ownership for now.

Original comment by schulte....@gmail.com on 20 May 2010 at 6:05

GoogleCodeExporter commented 9 years ago
OK this was an issue with the mapping between the atom names in the import file 
and the CCPN project - this is dealt with differently on the coordinate and NMR 
info level, and this created problems with some settings for NMR-STAR files 
(where things are very complicated because it depends on the situation what's 
being used).

Anyway fixed, update:

ccpnmr/format/converters/DataFormat.py
ccpnmr/format/converters/NmrStarFormat.py
ccpnmr/format/general/TopShared.py

and this one should work... does for me!

Original comment by wfvran...@gmail.com on 5 Aug 2010 at 4:52

GoogleCodeExporter commented 9 years ago
Great! This issue came up again today for 2l0s. I did the update and reran both 
2l0s and 2rqf. They are fine now.

I'm setting this to fixed.

Original comment by schulte....@gmail.com on 5 Aug 2010 at 7:32

GoogleCodeExporter commented 9 years ago
2rqf still needs work. I'm able to get quite good conversion, but the 
assignments and filtering don't do so well. 

Original comment by schulte....@gmail.com on 25 Aug 2010 at 5:59

GoogleCodeExporter commented 9 years ago
It looks like we are filtering everything out in doSurplus. The number of noe's 
goes from 11531 to 3055 between 2rqf_assign.str and 2rqf_nonsurplus.str. Does 
this look realistic to you Jurgen, or could something have gone wrong?

Here is the summary:

SUMMARY:
Found number of todo constraints:                       11899
Found number of exceptional constraints:                0
Found number of constraints to be double with others:   5280
Found number of impossible constraints     :            3145
Found number of fixed constraints          :            0
Found number of redundant constraints      :            235
Found number of non-redundant constraints:              3239
Found number of constraints to be surplus (E+C+D+I+F+R):8660

Original comment by schulte....@gmail.com on 13 Sep 2010 at 3:54

GoogleCodeExporter commented 9 years ago
The 2rqf restraints aren't parsed correctly by my code in Wattos. It's issue 4.
E.g.
207 ARG  HB2   207 ARG  HE      4.67            #peak     7
207 ARG  HB3   207 ARG  HE      0.00 
are probably parsed separately whereas they should be considered together.
Fixed this issue right ?;-)

Original comment by jurge...@gmail.com on 13 Sep 2010 at 6:42

GoogleCodeExporter commented 9 years ago
Yup. PDB still has this set to "To be Published." I would like to know more 
about how they calculated the structure.

Original comment by schulte....@gmail.com on 13 Sep 2010 at 6:54