AlphaBetaTest / cing

Automatically exported from code.google.com/p/cing
0 stars 0 forks source link

Chain identifiers are rather limited. #130

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Read the PDB entry which has the most chains. More than CING can accommodate.

Currently PDB holds 3 entries with more than 49 chains.
E.g. 1otz with:
0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z,a,b,c,d,
e,f,g,h,i,j,k,l,m,n,o,p,q
,r,s,t,u,v,w,x

So we do need to extend the list with numerals and lower case chain ids.

Will be fixed in next commit.

Original issue reported on code.google.com by jurge...@gmail.com on 3 Feb 2009 at 9:28

GoogleCodeExporter commented 9 years ago
Note that there are 215 entries with more than 26 chains. Makes this a 
prominent issue at least for X-ray 
structures we also want to be able to read.

Original comment by jurge...@gmail.com on 3 Feb 2009 at 10:10

GoogleCodeExporter commented 9 years ago
Note that 1otz really does have numbers and lowercase chain ids.

Original comment by jurge...@gmail.com on 5 Feb 2009 at 8:03

GoogleCodeExporter commented 9 years ago
1otz has 61 chains of which one is ' '; will be mapped to 'y' in CING

DEBUG: pdbParser.importCoordinates: <Molecule "pdb1otz" 
(C:61,R:8680,A:161320,M:1)>
.
----------------------------------------------------------------------
Ran 1 test in 584.430s

Original comment by jurge...@gmail.com on 5 Feb 2009 at 8:33

GoogleCodeExporter commented 9 years ago
Coded fine as shown above but needs testing which can only easily be done after 
restore of project is possible so 
I'll wait for issue 128 to resolve first. Assuming this is fixed.

Original comment by jurge...@gmail.com on 5 Feb 2009 at 8:56

GoogleCodeExporter commented 9 years ago
The following stack is another problem that arises with a huge molecular system 
like this entry.

*** set a breakpoint in malloc_error_break to debug
sh: line 1: 31248 Bus error               /Users/jd/workspace34/cing/bin/shiftx 
1Z model_000.pdb 
model_000_Z.out
This file contains more than 20 chains.  Please edit it to eliminate
some of the chains and try again.
CING started at : Mon Mar 30 22:50:14 2009
CING stopped at : Mon Mar 30 23:00:18 2009
CING took       : 603.961 s

Traceback (most recent call last):
  File "/Users/jd/workspace34/cing/python/cing/main.py", line 730, in <module>
    main()
  File "/Users/jd/workspace34/cing/python/cing/main.py", line 701, in main
    execfile(scriptFile, globals() )
  File "/Users/jd/workspace34/cing/python/cing/Scripts/doValidateiCing.py", line 40, in <module>
    project.runShiftx()
  File "/Users/jd/workspace34/cing/python/cing/PluginCode/shiftx.py", line 208, in runShiftx
    parseShiftxOutput( outputFile, project.molecule, chain.name )
  File "/Users/jd/workspace34/cing/python/cing/PluginCode/shiftx.py", line 74, in parseShiftxOutput
    for line in AwkLike( fileName, commentString = '#', minNF = 4 ):
  File "/Users/jd/workspace34/cing/python/cing/Libs/AwkLike.py", line 26, in __init__
    self.f = open(filename,'r')
IOError: [Errno 2] No such file or directory: 
'pdb1otz.cing/pdb1otz/Shiftx/model_000_0.out'

Will not fix this until somebody else complains about it.

Original comment by jurge...@gmail.com on 30 Mar 2009 at 9:10

GoogleCodeExporter commented 9 years ago
For entry 2i7z I get a similar problem:

DEBUG: Molecule.addChain: got next available one: z
ERROR IN CODE: CING exhausted the available 64 chain identifiers; see issue 130 
here:
http://code.google.com/p/cing/issues/detail?id=130
ERROR: Molecule.addChain: failed getNextAvailableChainId; skipping add.
ERROR: Failed to molecule.addChain(pdbOneLetterCode) for pdbOneLetterCode [ ]
WARNING: See also http://code.google.com/p/cing/issues/detail?id=244
 or http://code.google.com/p/cing/issues/detail?id=223

Probably to do with the 120 water molecules in there getting expanded by 
Ccpn.py code.

        1  water             1 $water              0 . no . . . . . . rr_2i7z 1 
        2 "43 MER"           3 $43_MER             A . no . . . . . . rr_2i7z 1 
        3 "43 MER"           3 $43_MER             B . no . . . . . . rr_2i7z 1 
        4 "MANGANESE II ION" 2 $MANGANESE__II__ION C . no . . . . . . rr_2i7z 1 

etc.

Original comment by jurge...@gmail.com on 20 Sep 2010 at 8:49

GoogleCodeExporter commented 9 years ago
This is a CCPN issue. In the NMR-STAR input the water molecules are nicely 
combined. In Analysis the water molecules are split; each being their own mol 
system.

Original comment by jurge...@gmail.com on 20 Sep 2010 at 8:55

GoogleCodeExporter commented 9 years ago
This is reported to Wim at issue 199 and then moved to NRG  isssssue 242.

Original comment by jurge...@gmail.com on 15 Nov 2010 at 10:25

GoogleCodeExporter commented 9 years ago
Just to have a record within CING I'm changing the status here to open for now.
We're waiting for sf.net to come back up after their crash. Wim already coded 
the fix.

Original comment by jurge...@gmail.com on 7 Feb 2011 at 10:00

GoogleCodeExporter commented 9 years ago
The chain problem is also present in entries:

2i7z
1l0r
1lcc
1lcd
1qch

Original comment by jurge...@gmail.com on 7 Feb 2011 at 10:01

GoogleCodeExporter commented 9 years ago
Chris, also these 5 entries should be reprocessed when you have had the time to 
update from the CCPN CVS archive. You will not see any changes on your side. 
Can you take a look?

I checked the fix for 2i7z and CCPN now indeed maintains only few chains:

Created molecule 43_MER (molType RNA, 43 chemComps)
Created molecule MANGANESE__II__ION (molType other, 1 chemComps)
Created molecule water_M (molType other, 60 chemComps)
Created molecule water_N (molType other, 60 chemComps)
Created chain 'A', start seqCode 1, end seqCode 43, molecule '43_MER'...
Created chain 'B', start seqCode 44, end seqCode 86, molecule '43_MER'...
Created chain 'C', start seqCode 101, end seqCode 101, molecule 
'MANGANESE__II__ION'...
Created chain 'D', start seqCode 102, end seqCode 102, molecule 
'MANGANESE__II__ION'...
Created chain 'E', start seqCode 103, end seqCode 103, molecule 
'MANGANESE__II__ION'...
Created chain 'F', start seqCode 104, end seqCode 104, molecule 
'MANGANESE__II__ION'...
Created chain 'G', start seqCode 105, end seqCode 105, molecule 
'MANGANESE__II__ION'...
Created chain 'H', start seqCode 106, end seqCode 106, molecule 
'MANGANESE__II__ION'...
Created chain 'I', start seqCode 107, end seqCode 107, molecule 
'MANGANESE__II__ION'...
Created chain 'J', start seqCode 108, end seqCode 108, molecule 
'MANGANESE__II__ION'...
Created chain 'K', start seqCode 109, end seqCode 109, molecule 
'MANGANESE__II__ION'...
Created chain 'L', start seqCode 110, end seqCode 110, molecule 
'MANGANESE__II__ION'...
Created chain 'M', start seqCode 106, end seqCode 60, molecule 'water_M'...
Created chain 'N', start seqCode 1, end seqCode 140, molecule 'water_N'...

Original comment by jurge...@gmail.com on 28 Mar 2011 at 9:21

GoogleCodeExporter commented 9 years ago
I reprocessed 2i7z 1l0r 1lcc 1lcd 1qch with these results:

2i7z - no difference - wattos problem? most of these data are not being parsed.
1l0r - no longer has restraints( (nothing in dumpzone)
1lcc - no difference 
1lcd - no difference
1qch - no longer has restraints

Original comment by schulte....@gmail.com on 31 Mar 2011 at 3:09

GoogleCodeExporter commented 9 years ago
You might not see the differences but they are important to CING e.g. 2i7z has 
nice waters now:

        1 "43 MER"           1 $43_MER             A . no . . . . . . rr_2i7z 1 
        2 "43 MER"           1 $43_MER             B . no . . . . . . rr_2i7z 1 
        3 "MANGANESE II ION" 2 $MANGANESE__II__ION C . no . . . . . . rr_2i7z 1 
        4 "MANGANESE II ION" 2 $MANGANESE__II__ION D . no . . . . . . rr_2i7z 1 
        5 "MANGANESE II ION" 2 $MANGANESE__II__ION E . no . . . . . . rr_2i7z 1 
        6 "MANGANESE II ION" 2 $MANGANESE__II__ION F . no . . . . . . rr_2i7z 1 
        7 "MANGANESE II ION" 2 $MANGANESE__II__ION G . no . . . . . . rr_2i7z 1 
        8 "MANGANESE II ION" 2 $MANGANESE__II__ION H . no . . . . . . rr_2i7z 1 
        9 "MANGANESE II ION" 2 $MANGANESE__II__ION I . no . . . . . . rr_2i7z 1 
       10 "MANGANESE II ION" 2 $MANGANESE__II__ION J . no . . . . . . rr_2i7z 1 
       11 "MANGANESE II ION" 2 $MANGANESE__II__ION K . no . . . . . . rr_2i7z 1 
       12 "MANGANESE II ION" 2 $MANGANESE__II__ION L . no . . . . . . rr_2i7z 1 
       13 "water M"          3 $water_M            M . no . . . . . . rr_2i7z 1 
       14 "water N"          4 $water_N            N . no . . . . . . rr_2i7z 1 

I'll close this issue when I'm done processing them. 

Thanks Chris!

Original comment by jurge...@gmail.com on 1 Apr 2011 at 8:21

GoogleCodeExporter commented 9 years ago
Checked ok.

Original comment by jurge...@gmail.com on 8 Apr 2011 at 2:11