google-code-export / nmrrestrntsgrid

Automatically exported from code.google.com/p/nmrrestrntsgrid
0 stars 0 forks source link

New PDB tags and PDBx information in coordinate/restraints #236

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
This issue is to check whether the new NMR-STAR files contain all
information necessary for public release as remediated restraint files at
the PDB. Note that the coordinates are still in here, they will be removed.

For testing purposes are attached a list of PDB ID codes with PDB chain
codes different from the PDBx codes (chainDiff.txt), and a list of PDB ID
codes where insertion codes are used (insertCode.txt), both generated by
Monica.

I generated NMR-STAR files with PDB_ tags for 2otk (chain codes different)
and 1tut (insertion code present), also both attached (gzipped).

To reproduce the PDBx information then, use the 'Asym_ID' chain identifier
in the '_Entity_assembly' loop in the 'assembly' saveframe. For the residue
number, just use the corresponding 'Label_comp_index_ID' tag in the
coordinate/restraint saveframes.

Original issue reported on code.google.com by wfvran...@gmail.com on 16 Nov 2009 at 3:58

Attachments:

GoogleCodeExporter commented 9 years ago
I've got this working on tang. The tags are showing up, but the data in the 
columns aren't. Is there something 
else I should be doing?

Original comment by schulte....@gmail.com on 18 Nov 2009 at 3:29

GoogleCodeExporter commented 9 years ago
I had to modify the original NMR-STAR files to avoid having to hack in all 
kinds of
exceptions in my code - have a look at the 1tut and 2otk original files. There 
are
PDB tags in there that can come straight from the PDBx files - this might be an 
issue
for Jurgen? Not sure how you generate the 'joint restraint coordinate' NMR-STAR 
files.

Original comment by wfvran...@gmail.com on 18 Nov 2009 at 4:29

GoogleCodeExporter commented 9 years ago
OK.

Then we need Jurgen on this list. I'll take a look at the code. 

Original comment by schulte....@gmail.com on 18 Nov 2009 at 4:41

GoogleCodeExporter commented 9 years ago
Ok, just echoing here for the sake of discussion what Chris mailed me.
"""
Basically, Jurgen, the list:

      _Atom_site.Label_asym_ID
      _Atom_site.Model_ID
      _Atom_site.ID
      _Atom_site.Label_entity_assembly_ID
      _Atom_site.Label_entity_ID
      _Atom_site.Label_comp_index_ID
      _Atom_site.Label_comp_ID
      _Atom_site.Label_atom_ID
      _Atom_site.Type_symbol
      _Atom_site.Cartn_x
      _Atom_site.Cartn_y
      _Atom_site.Cartn_z
      _Atom_site.Occupancy
      _Atom_site.Uncertainty
      _Atom_site.PDB_ins_code
      _Atom_site.Auth_asym_ID
      _Atom_site.Auth_seq_ID
      _Atom_site.Auth_comp_ID
      _Atom_site.Auth_atom_ID
      _Atom_site.Entry_ID
      _Atom_site.Conformer_family_coord_set_ID

needs to be expanded to include other things for the pdbx files:

     _Atom_site.Model_ID
     _Atom_site.Model_site_ID
     _Atom_site.ID
     _Atom_site.Assembly_atom_ID
     _Atom_site.Label_entity_assembly_ID
     _Atom_site.Label_entity_ID
     _Atom_site.Label_comp_index_ID
     _Atom_site.Label_comp_ID
     _Atom_site.Label_atom_ID
     _Atom_site.Type_symbol
     _Atom_site.Cartn_x
     _Atom_site.Cartn_y
     _Atom_site.Cartn_z
     _Atom_site.Cartn_x_esd
     _Atom_site.Cartn_y_esd
     _Atom_site.Cartn_z_esd
     _Atom_site.Occupancy
     _Atom_site.Occupancy_esd
     _Atom_site.Uncertainty
     _Atom_site.Ordered_flag
     _Atom_site.Footnote_ID
     _Atom_site.PDBX_label_asym_ID
     _Atom_site.PDBX_label_seq_ID
     _Atom_site.PDBX_label_comp_ID
     _Atom_site.PDBX_label_atom_ID
     _Atom_site.PDBX_formal_charge
     _Atom_site.PDBX_label_entity_ID
     _Atom_site.PDB_record_ID
     _Atom_site.PDB_model_num
     _Atom_site.PDB_strand_id
     _Atom_site.PDB_residue_no
     _Atom_site.PDB_ins_code
     _Atom_site.PDB_residue_name
     _Atom_site.PDB_atom_name
     _Atom_site.Auth_asym_ID
     _Atom_site.Auth_chain_ID
     _Atom_site.Auth_entity_assembly_ID
     _Atom_site.Auth_seq_ID
     _Atom_site.Auth_comp_ID
     _Atom_site.Auth_atom_ID
     _Atom_site.Auth_atom_name
     _Atom_site.Details
     _Atom_site.Entry_ID
     _Atom_site.Conformer_family_coord_set_ID

"""

We remove: _Atom_site.Label_asym_ID and I will need to add the below and only 
the below. Please confirm.
Also, please confirm we don't need tag changes in the restraints as the issue 
title might suggest.
Sorry but I need detailed info like this in order to do it in one blow.

_Atom_site.Assembly_atom_ID
_Atom_site.Auth_atom_name
_Atom_site.Auth_chain_ID
_Atom_site.Auth_entity_assembly_ID
_Atom_site.Cartn_x_esd
_Atom_site.Cartn_y_esd
_Atom_site.Cartn_z_esd
_Atom_site.Details
_Atom_site.Footnote_ID
_Atom_site.Model_site_ID
_Atom_site.Occupancy_esd
_Atom_site.Ordered_flag
_Atom_site.PDBX_formal_charge
_Atom_site.PDBX_label_asym_ID
_Atom_site.PDBX_label_atom_ID
_Atom_site.PDBX_label_comp_ID
_Atom_site.PDBX_label_entity_ID
_Atom_site.PDBX_label_seq_ID
_Atom_site.PDB_atom_name
_Atom_site.PDB_model_num
_Atom_site.PDB_record_ID
_Atom_site.PDB_residue_name
_Atom_site.PDB_residue_no
_Atom_site.PDB_strand_id

Original comment by jurge...@gmail.com on 19 Nov 2009 at 9:40

GoogleCodeExporter commented 9 years ago
Eldon, please confirm so I can push this in.

Original comment by jurge...@gmail.com on 16 Dec 2009 at 11:03

GoogleCodeExporter commented 9 years ago
Jurgen,

I am sorry I completely missed this request. I started to look at it, but the
answeris more complicated than I expected. I will get back to you tonight or 
over the
weekend.

Eldon

Original comment by webmas...@bmrb.wisc.edu on 17 Dec 2009 at 9:11

GoogleCodeExporter commented 9 years ago
Jurgen,

The goal is to extract the information from the 'auth' tags and the other tags 
if
they are populated in the pdbx files and in the end associate this information 
with
the 'PDB' tags in NMR-STAR. I believe what is needed in the first step is the 
mapping
of the values from the pdbx tags to the corresponding NMR-STAR tags as shown in 
the
table below. The final step would be to move the data from the 'Auth' tags in
NMR-STAR to the 'PDB' tags in NMR-STAR. Either Wim's software would do this 
final
step when the restraint nomenclature and PDB nomenclature is made consistent or 
would
your software do this.  The final mapping of the information from the pdbx 
files to
the NMR-STAR files is shown in Table II below.

Chris and Wim's comments on this may be needed.

Eldon

Table I.
pdbx                                 NMR-STAR
_atom_site.auth_asym_id              _Atom_site.Auth_asym_ID
_atom_site.auth_atom_id              _Atom_site.Auth_atom_ID
_atom_site.auth_comp_id              _Atom_site.Auth_comp_ID
_atom_site.auth_seq_id               _Atom_site.Auth_seq_ID
_atom_site.pdbx_auth_alt_id          _Atom_site.Auth_alt_ID
_atom_site.pdbx_auth_atom_name       _Atom_site.Auth_atom_name
_atom_site.pdbx_PDB_atom_name        _Atom_site.PDB_atom_name
_atom_site.pdbx_PDB_ins_code         _Atom_site.PDB_ins_code
_atom_site.pdbx_PDB_model_num        _Atom_site.PDB_model_num
_atom_site.pdbx_PDB_residue_name     _Atom_site.PDB_residue_name
_atom_site.pdbx_PDB_residue_no       _Atom_site.PDB_residue_no
_atom_site.pdbx_PDB_strand_id        _Atom_site.PDB_strand_ID

Table II.
pdbx                                 NMR-STAR
_atom_site.auth_asym_id              _Atom_site.PDB_strand_ID
_atom_site.auth_atom_id              _Atom_site.PDB_atom_name
_atom_site.auth_comp_id              _Atom_site.PDB_residue_name
_atom_site.auth_seq_id               _Atom_site.PDB_residue_no
_atom_site.pdbx_auth_alt_id          _Atom_site.Auth_alt_ID
_atom_site.pdbx_auth_atom_name       _Atom_site.Auth_atom_name
_atom_site.pdbx_PDB_ins_code         _Atom_site.PDB_ins_code
_atom_site.pdbx_PDB_model_num        _Atom_site.PDB_model_num

Original comment by webmas...@bmrb.wisc.edu on 17 Dec 2009 at 9:52

GoogleCodeExporter commented 9 years ago
For clarification: my code currently does the (final) mapping in Table II - I
modified the original files to reflect this in the examples I generated. 

I figured it should be easy enough to regenerate the input files (with PBDx
coordinates and original restraint info), plus that way I can avoid having to 
put in
hacks to deal with the Table I mappings.

Original comment by wfvran...@gmail.com on 4 Jan 2010 at 10:57

GoogleCodeExporter commented 9 years ago
A further note on comment 8: my code currently handles the NMR-STAR tags from 
Table
II correctly, so these should be in the 'joined' coordinate/restraint file 
generated
by Wattos!

Original comment by wfvran...@gmail.com on 4 Jan 2010 at 12:49

GoogleCodeExporter commented 9 years ago
Cool, I'll get on this.
Happy New Year everyone!

Original comment by jurge...@gmail.com on 4 Jan 2010 at 12:56

GoogleCodeExporter commented 9 years ago
Ok, I'm remapping the 5 tag names and adding the 2 new ones according to 
(abbreviated after period):

pdbx                    NMR-STAR 3.1                         NMR-STAR 3.x
--------------------------------------------------------------------------------
-
                        Label_asym_ID
                        Model_ID                                PDB_model_num
                        ID
                        Label_entity_assembly_ID
                        Label_entity_ID
                        Label_comp_index_ID
                        Label_comp_ID
                        Label_atom_ID
                        Type_symbol
                        Cartn_x
                        Cartn_y
                        Cartn_z
                        Occupancy
                        Uncertainty
                        PDB_ins_code
                        Auth_asym_ID                            PDB_strand_ID
                        Auth_seq_ID                             PDB_residue_no
                        Auth_comp_ID                            PDB_residue_name
                        Auth_atom_ID                            PDB_atom_name
                        Entry_ID
                        Conformer_family_coord_set_ID
pdbx_auth_alt_id                                                Auth_alt_ID
pdbx_auth_atom_name                                             Auth_atom_name 

However, I'm not finding the new tags (like pdbx_auth_alt_id)  in the mmCIF 
files I get from
rsync.wwpdb.org::ftp/
Are these new tags to come?

Original comment by jurge...@gmail.com on 4 Jan 2010 at 3:26

GoogleCodeExporter commented 9 years ago
OK, got this in but the sorting is slightly off in Wattos output as it uses 
Data/validict.20080404.1.str

Wim, can you check the attached output for 1brv? It doesn't contain the 
restraints but they are left untouched.

Code checked in under Wattos revision 130. I don't know how to link between 
these 2 projects though sorry.

Original comment by jurge...@gmail.com on 4 Jan 2010 at 7:21

Attachments:

GoogleCodeExporter commented 9 years ago
Model_ID is now missing from Conformer_family_coord_set... it should be there 
as far
as I'm aware.

Original comment by wfvran...@gmail.com on 5 Jan 2010 at 11:10

GoogleCodeExporter commented 9 years ago
I remapped it according to the table in my comment 11.
It's in the table II of comment 7.
I don't suppose we need to duplicate it do we?

Original comment by jurge...@gmail.com on 5 Jan 2010 at 12:37

GoogleCodeExporter commented 9 years ago
From what I can tell it's an obligatory value in the NMR-STAR, so yes. Maybe the
PDB_model_num is not necessary then, not sure.

Original comment by wfvran...@gmail.com on 5 Jan 2010 at 12:46

GoogleCodeExporter commented 9 years ago
Eldon, can you clarify this point please?

Original comment by jurge...@gmail.com on 5 Jan 2010 at 12:48

GoogleCodeExporter commented 9 years ago
The Model_ID value is located in the '_Atom_site' table.
'_Conformer_family_coord_set' describes the save frame for the 'family' and the 
save
frame contains the coordinates for all models as listed in the 'Atom_site' 
table.
There has never been a '_Conformer_family_coord_set.Model_ID' tag. If there 
were such
a tag at the save frame level then there would have to be a
'Conformer_family_coord_set' save frame for every reported model.

I have corrected the errors in the dictionary (hopefully) and put the new 
version in svn.

Original comment by Eldon.Ul...@gmail.com on 5 Jan 2010 at 2:43

GoogleCodeExporter commented 9 years ago
>The Model_ID value is located in the '_Atom_site' table.

Are you asking me to keep it in and add another column _Atom_site. 
PDB_model_num with the same values? 
That wouldn't make sense right? Except for issue 168 they're the same...

Original comment by jurge...@gmail.com on 5 Jan 2010 at 3:08

GoogleCodeExporter commented 9 years ago
Ah I did mean Atom_site in Comment 13... apologies for any confusion caused! 
Jurgen's
last point still stands...  

Original comment by wfvran...@gmail.com on 5 Jan 2010 at 3:31

GoogleCodeExporter commented 9 years ago
Wim, do you need the _Atom_site.PDB_model_num tag to be populated? Do you 
always use
PDB model '1' when the atom nomenclature is made consistent? For completeness, I
would go with putting in the _Atom_site.PDB_model_num tag and populating it 
with the
redundant values. Overall, I feel this whole mess is redundant.

Original comment by Eldon.Ul...@gmail.com on 5 Jan 2010 at 4:49

GoogleCodeExporter commented 9 years ago
No I don't need (or particularly want) PDB_model_num, I'm using Model_ID at the
moment. Some model number indication is necessary.

Original comment by wfvran...@gmail.com on 5 Jan 2010 at 4:53

GoogleCodeExporter commented 9 years ago
It is fine with me, then, to leave out the PDB_model_num.

Original comment by Eldon.Ul...@gmail.com on 5 Jan 2010 at 5:33

GoogleCodeExporter commented 9 years ago
Cool, I'll keep the original on that one.

Can somebody point me to an example of a mmCIF with the 
_atom_site.pdbx_auth_alt_id tag? Is it safe to leave 
out for this iteration?

Original comment by jurge...@gmail.com on 5 Jan 2010 at 6:46

GoogleCodeExporter commented 9 years ago
I have never seen one. I think you can leave it out.

Original comment by Eldon.Ul...@gmail.com on 5 Jan 2010 at 7:17

GoogleCodeExporter commented 9 years ago
Attached is the new version. Code committed in Wattos revision 131.

Original comment by jurge...@gmail.com on 6 Jan 2010 at 8:46

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by jurge...@gmail.com on 6 Jan 2010 at 8:46

GoogleCodeExporter commented 9 years ago
Two more issues to resolve:

1. It is now 'PDB_strand_ID', where it used to be 'PDB_strand_id'? The casing 
makes a
difference in my code... also did we want to follow the exact mmCIF naming?

2. The tag '_Atom_site.Label_asym_ID' (in Jurgen's file) does not exist as far 
as I
can tell. Should this be '_Atom_site.PDBX_label_asym_ID' in the NMR-STAR?

Original comment by wfvran...@gmail.com on 6 Jan 2010 at 9:10

GoogleCodeExporter commented 9 years ago
Eldon, please resolve Wim's issues. I'm happy to change them anyway you like.
Thanks

Original comment by jurge...@gmail.com on 6 Jan 2010 at 9:21

GoogleCodeExporter commented 9 years ago
Issues raised by Wim:

1. In NMR-STAR, the convention has been to capitalize 'ID' and I would like to 
stick
to this. I do not feel there is a need to follow exact mmCIF naming. If their
conventions were followed, some tags would become 
'Atom_site.NMR-STAR_PDBX_PDB...'. I
think all of this redundant nomenclature should be captured in a separate table 
and
not in the Atom_site table, but that is another discussion.

2. Yes, the '_Atom_site.Label_asym_ID' tag should be changed to
'_Atom_site.PDBX_label_asym_ID'. Sorry, I missed this one in the past.

Original comment by Eldon.Ul...@gmail.com on 6 Jan 2010 at 4:00

GoogleCodeExporter commented 9 years ago
_Atom_site.PDBX_label_asym_ID or:
_Atom_site.PDB_label_asym_ID without the X? It's the only one with an X now.

Original comment by jurge...@gmail.com on 6 Jan 2010 at 4:11

GoogleCodeExporter commented 9 years ago
I verified against:
http://www.bmrb.wisc.edu/dictionary/3.1html_frame/frame_AtomSite.html#_Atom_site
.PDBX_label_asym_ID
and it seems to be with X

Committed in Wattos revision 132.

Original comment by jurge...@gmail.com on 6 Jan 2010 at 4:23

Attachments:

GoogleCodeExporter commented 9 years ago
Did I really mess up somewhere? Below are the tags used to record data from pdbx
files and from PDB files. The asym_ID is a PDBX construct and does not exist in 
the
PDB set. I think way back when 'asym_ID' was useful in mapping to
'entity_assembly_ID'. The 'asym_ID' tag also maps to the 
'_Entity_assembly.Asym_ID'
tag and so is probably useful to keep. The other 'PDBX' tags are equivalent to 
the
NMR-STAR tags and I do not feel need to be included. I could be wrong, but I 
think
John Westbrook's request pertained to the 'PDB' tags, where 'PDB_strand_ID' is
usually but I am not sure always equivalent to the 'asym_ID' value.

_Atom_site.PDBX_label_asym_ID
_Atom_site.PDBX_label_seq_ID
_Atom_site.PDBX_label_comp_ID
_Atom_site.PDBX_label_atom_ID
_Atom_site.PDBX_formal_charge
_Atom_site.PDBX_label_entity_ID

_Atom_site.PDB_record_ID
_Atom_site.PDB_model_num
_Atom_site.PDB_strand_ID
_Atom_site.PDB_ins_code
_Atom_site.PDB_residue_no
_Atom_site.PDB_residue_name
_Atom_site.PDB_atom_name

Original comment by Eldon.Ul...@gmail.com on 6 Jan 2010 at 4:28

GoogleCodeExporter commented 9 years ago
If all here are happy with the current file I would like to close the issue.

Original comment by jurge...@gmail.com on 6 Jan 2010 at 6:38

GoogleCodeExporter commented 9 years ago
Is it ready to be tested?

Original comment by schulte....@gmail.com on 6 Jan 2010 at 7:07

GoogleCodeExporter commented 9 years ago
As far as Wattos is concerned I believe so. The code is in the last revision.

Original comment by jurge...@gmail.com on 7 Jan 2010 at 8:41

GoogleCodeExporter commented 9 years ago
And Wattos revision 133 now makes use of the latest dictionary for sorting as 
well.
Wim, what does we need to update in FC to test?

Original comment by jurge...@gmail.com on 7 Jan 2010 at 8:49

GoogleCodeExporter commented 9 years ago
There are likely to be some problems - I was using the auth_ tags for mapping, 
and
since they're now PDB_ I have to make sure my code still works correctly.

One question to Eldon: are the auth_ tags still necessary? If not (and they can 
go),
then it's easy enough to update my stuff, otherwise it'll take a while.

Anyway the following steps are necessary (1-2 can happen regardless of what I 
still
have to do):

1. Remake all joined coordinate/restraint NMR-STAR files
2. Let me know where I can find them when done
3. I'll run some tests and update my code to make sure it's all working
4. I'll give you the update procedure when it's ready

Original comment by wfvran...@gmail.com on 7 Jan 2010 at 10:50

GoogleCodeExporter commented 9 years ago
Since Chris has the up to date versions it might be best to do 1 and 2 in 
Madison. Chris can you 'volunteer'?

Original comment by jurge...@gmail.com on 7 Jan 2010 at 10:53

GoogleCodeExporter commented 9 years ago
The 'Auth' tags would not be populated now in '_Atom_site'. Just to be clear, 
they
are still needed and would be populated in the restraints tables, as I think 
they
actually represent the original author nomenclature as extracted from the 
restraints
files.

Original comment by Eldon.Ul...@gmail.com on 7 Jan 2010 at 2:19

GoogleCodeExporter commented 9 years ago
Yes. I'll start testing on tang. Is Wattos up to date there, or do I need to do 
an svn update?

Original comment by schulte....@gmail.com on 7 Jan 2010 at 2:21

GoogleCodeExporter commented 9 years ago
OK Eldon, in that case I'll wait for the new files and make sure everything 
still works.

Original comment by wfvran...@gmail.com on 7 Jan 2010 at 2:29

GoogleCodeExporter commented 9 years ago
Let me know when to update and get going.

Original comment by schulte....@gmail.com on 7 Jan 2010 at 2:47

GoogleCodeExporter commented 9 years ago
Chris can you 'volunteer'?
Everything seems to need updating.

Original comment by jurge...@gmail.com on 21 Jan 2010 at 2:40

GoogleCodeExporter commented 9 years ago
ccpn and recoord should be recent, but I can do another update if need be.

I will update wattos and the NRG on tang today and test it all.

Original comment by schulte....@gmail.com on 21 Jan 2010 at 3:37

GoogleCodeExporter commented 9 years ago
I've updated everything on tang and these are the latest versions of the test 
files we've been using.

Note: I had to comment out all chainMapping directives for 1brv in the 
presetDict in order to get it to convert.

Please confirm that these are what we want. I'm going test some more on tang.

Original comment by schulte....@gmail.com on 21 Jan 2010 at 7:42

Attachments:

GoogleCodeExporter commented 9 years ago
I am not able to upload 3 files at once, so here they are, separately.

Original comment by schulte....@gmail.com on 21 Jan 2010 at 7:44

Attachments:

GoogleCodeExporter commented 9 years ago
Last one for now. This was too big to upload uncompressed. 

Original comment by schulte....@gmail.com on 21 Jan 2010 at 7:48

Attachments:

GoogleCodeExporter commented 9 years ago
These are the wrong files - I need the ones prior to the linking process (so 
what
comes out of wattos, see comment 37).

Original comment by wfvran...@gmail.com on 26 Jan 2010 at 10:58

GoogleCodeExporter commented 9 years ago
Sorry about that. All joined coordinates/restraints files for the new update 
are located in
/big/docr/ccpn_tmp/data/archives/bmrb/nmrRestrGrid
The number of files I was able to produce was limited by the fact that /big has 
run out of memory, but 
anything after jan 21 was done with the update.

You will have to ssh onto lionfish@bmrb.wisc.edu, and then to tang. 

If you need other files now, I can send them. Let me know if we need to get 
Dimitiri to fix anything to let you 
log on.

Original comment by schulte....@gmail.com on 26 Jan 2010 at 2:40

GoogleCodeExporter commented 9 years ago
OK I have now tested and checked in the code, it is giving me the same results 
as before.

Update:

CCPN sourceForge repository, python/ directory.

Note that I have included a new bit of format/CCPN sequence alignment code in 
case
the original code cannot find a mapping - this should only affect new entries.

There might be issues with format chain codes that are labelled '_' for mapping
purposes - this was a mistake that crept in a while ago, I don't think it was a
problem for you but just so you know.

Original comment by wfvran...@gmail.com on 28 Jan 2010 at 11:22