VuisterLab / cing

Automated Validation of NMR Structures
http://nmr.le.ac.uk
2 stars 4 forks source link

Check about CS count Ala #308

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Is a bit higher than expected for Ala for a few residues in NRG-CING.
Multiple BMRB lists?

Which residues Wim

Original issue reported on code.google.com by jurge...@gmail.com on 19 Sep 2011 at 1:12

GoogleCodeExporter commented 9 years ago
Here's a list - tuples of (residue_id, cs_count (that's where my number comes 
from) and noe_compl4)

[(177719, 13, 35.7), (291022, 14, 64.3), (291024, 14, 31.3), (291070, 14, 
50.0), (291073, 14, 47.1), (291074, 14, 26.7), (291076, 14, 53.8), (291077, 14, 
42.9), (653506, 9, 43.8), (653512, 10, 45.0), (653515, 11, 36.8), (117899, 10, 
12.5), (117926, 10, 16.7), (118014, 10, 18.2), (118039, 10, 18.8), (171705, 11, 
41.7), (175471, 11, 41.2), (175473, 11, 33.3), (175505, 11, 36.4), (175507, 11, 
28.6), (177023, 13, 44.4), (177322, 13, 27.8), (177387, 11, 35.0), (177388, 11, 
38.1), (177402, 11, 52.9), (177403, 11, 28.1), (177485, 13, 53.8), (177542, 13, 
25.0), (196061, 10, 55.6), (196029, 10, 21.1), (196034, 10, 36.1), (196039, 10, 
41.2), (196074, 10, 60.0), (196075, 10, 44.0), (196161, 10, 53.8), (201936, 14, 
47.1), (210214, 10, 18.2), (210218, 10, 37.0), (210223, 10, 41.7), (210253, 10, 
30.8), (210273, 10, 50.0), (210276, 10, 44.4), (210430, 10, 47.1), (290989, 11, 
75.0), (290992, 14, 47.1), (290997, 14, 23.5), (291007, 12, 25.0), (291010, 13, 
41.2), (291015, 14, 42.1), (291016, 11, 75.0), (291021, 14, 58.3), (291026, 14, 
27.8), (291038, 14, 50.0), (291041, 14, 43.8), (291046, 14, 38.9), (291047, 14, 
0.0), (291049, 14, 33.3), (291028, 14, 50.0), (291030, 14, 38.5), (291032, 14, 
35.7), (291040, 12, 33.3), (291044, 13, 60.0), (291048, 14, 56.3), (291053, 14, 
50.0), (291056, 14, 37.5), (291059, 14, 50.0), (291068, 14, 53.8), (309594, 10, 
22.7), (309597, 10, 35.9), (309600, 10, 37.5), (309618, 10, 52.6), (309626, 10, 
60.0), (309627, 10, 34.5), (309682, 10, 43.8), (449716, 14, 39.4), (449727, 14, 
47.1), (449758, 14, 22.2), (449947, 14, 16.0), (461593, 11, 62.5), (461722, 12, 
61.3), (461723, 12, 58.3), (461759, 12, 43.3), (461815, 12, 65.0), (554151, 10, 
0.0), (554192, 9, 0.0), (555779, 13, 55.3), (653132, 9, 28.6), (653229, 11, 
61.5), (653239, 11, 57.1), (653252, 11, 50.0), (653274, 11, 64.7), (653323, 9, 
40.7), (653337, 10, 25.0), (653351, 11, 52.4), (653372, 11, 50.0), (653499, 10, 
62.5), (695238, 13, 81.8), (695272, 13, 57.9), (869207, 27, 41.7), (869208, 27, 
50.0), (869213, 27, 33.3), (871700, 10, 47.1), (871701, 10, 50.0), (871709, 10, 
28.6), (871729, 10, 36.4), (871732, 10, 54.5)]

Original comment by wfvran...@gmail.com on 20 Sep 2011 at 1:18

GoogleCodeExporter commented 9 years ago
Got 'm by
SELECT e.name, r.cs_count,
       r.name as r_name, r.number as r_number, c.name as c_name
FROM
nrgcing.cingentry e,
nrgcing.cingchain c,
nrgcing.cingresidue r
 where
r.entry_id = e.entry_id AND
c.entry_id = e.entry_id AND
r.name = 'ALA' AND
r.cs_count > 13
order by r.cs_count desc
limit 100
;

giving:

 1d0r       27 ALA          19 A
 1d0r       27 ALA          18 A
 1d0r       27 ALA           2 A
 1d0r       27 ALA          24 A
 1n5p       24 ALA          41 A
 1n5p       24 ALA          52 A
 1n5p       24 ALA          31 A
 1n5h       24 ALA          41 A
 1n5h       24 ALA          52 A
 1n5p       24 ALA          37 A
 1n5h       24 ALA          31 A
 1n5h       24 ALA          37 A
 1b4m       22 ALA          29 A
 1n5p       20 ALA          66 A
 1n5h       20 ALA          66 A
 1t8c       17 ALA          32 A
 1t8d       17 ALA          32 A
 1t8c       17 ALA         131 A
 1t8d       17 ALA          35 A
 1t8c       17 ALA          35 A
 1t8d       17 ALA         116 A
 1t8c       17 ALA         124 A
 1t8c       17 ALA         116 A
 1t8d       17 ALA         131 A
 1t8d       17 ALA         124 A
 1t8d       16 ALA          59 A
 1t8c       16 ALA          59 A
 1t8c       16 ALA          91 A
 1t8d       16 ALA          91 A
 1y03       14 ALA          26 A
 1y03       14 ALA          28 A
 1y03       14 ALA          29 A
 1y03       14 ALA          30 A
 1y03       14 ALA          31 A
 1y03       14 ALA          32 A
 1y03       14 ALA          34 A
 1y04       14 ALA           9 A
 1y04       14 ALA          10 A
 1y04       14 ALA          11 A
 1y04       14 ALA          17 A
 1y04       14 ALA          19 A
 1y04       14 ALA          20 A
 1y04       14 ALA          21 A
 1y04       14 ALA          25 A
 2gi4       14 ALA          18 A
 2gi4       14 ALA          29 A
 2gi4       14 ALA          40 A
 2gi4       14 ALA          60 A
 2gi4       14 ALA         141 A
 2k79       14 ALA         178 B
 2k79       14 ALA         178 A
 2k79       14 ALA         191 B
 2k79       14 ALA         191 A
 2rno       14 ALA          -1 A
 2rno       14 ALA           2 A
 2rno       14 ALA          13 A
 2rno       14 ALA          67 A
 2rno       14 ALA          69 A
 2rno       14 ALA          84 A
 2rno       14 ALA          88 A
 1b4c       14 ALA           6 B
 1b4c       14 ALA           6 A
 2rno       14 ALA          44 A
 1y03       14 ALA          19 A
 1y03       14 ALA          20 A
 1y04       14 ALA          26 A
 1y04       14 ALA          28 A
 1y04       14 ALA          29 A
 1y04       14 ALA          30 A
 1y04       14 ALA          31 A
 1y04       14 ALA          32 A
 1y04       14 ALA          34 A
 2rno       14 ALA         105 A
 1kkd       14 ALA          64 A
 1kkd       14 ALA           8 A
 1kkd       14 ALA          29 A
 1kkd       14 ALA          30 A
 1kkd       14 ALA          52 A
 1kkd       14 ALA          82 A
 1kkd       14 ALA          89 A
 1ssf       14 ALA         130 A
 1sym       14 ALA           6 B
 1sym       14 ALA           6 A
 1y03       14 ALA           7 A
 1y03       14 ALA           9 A
 1y03       14 ALA          10 A
 1y03       14 ALA          11 A
 1y03       14 ALA          17 A
 1y04       14 ALA           7 A
 1y03       14 ALA          21 A
 1y03       14 ALA          25 A

Now I'll look.

Original comment by jurge...@gmail.com on 20 Sep 2011 at 1:29

GoogleCodeExporter commented 9 years ago
In fact that first residue only has a three shifts in the RDB:

 1d0r   A      ALA          19 O          NULL
 1d0r   A      ALA          19 C          NULL
 1d0r   A      ALA          19 HB3        NULL
 1d0r   A      ALA          19 HB2        NULL
 1d0r   A      ALA          19 HB1        NULL
 1d0r   A      ALA          19 CB         NULL
 1d0r   A      ALA          19 MB     1.286496
 1d0r   A      ALA          19 HA     4.076496
 1d0r   A      ALA          19 CA         NULL
 1d0r   A      ALA          19 H      7.876496
 1d0r   A      ALA          19 N          NULL

and also at:
http://nmr.cmbi.ru.nl/NRG-CING/data/d0/1d0r/1d0r.cing/1d0r/HTML/Molecule/atoms.h
tml#_top

but the entry has 9 CS lists. At residue level in nrgcing.cingresidue#cs_count 
they were all added together.
It's a serie with varying percentages of 2,2,2-trifluoroethanol-d3.

This double counting also occurs at the project level again and personaly I 
would like to preserve this feature.

Work around: do a slower 'group by' sql query to count from atom level.
SELECT r.residue_id, count(*) as r_cs_count
FROM
nrgcing.cingresidue r,
nrgcing.cingatom a
 where
a.residue_id = r.residue_id AND
a.cs IS NOT NULL
group by r.residue_id
order by r_cs_count desc
limit 100

And the highest residue (nucleic acid) then is real:
 2koc A             4 A            27

Original comment by jurge...@gmail.com on 20 Sep 2011 at 1:50