google-code-export / nmrrestrntsgrid

Automatically exported from code.google.com/p/nmrrestrntsgrid
0 stars 0 forks source link

STAR formatting issue #254

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I ran 1ba5 through s2nmr and it didn't like it. It said here were parse errors. 
It looks ok at NRG, but not on wwPDB_divided. I am going to remove it from 
there. 

[||] save_remove_some_restraint_saveframes
 Done
WARN: at '41': line 18591, character 18: There is a mismatch between the number 
of column names in the loop header versus the number of values in a row of 
text.  By counting terms, the value (41) should be the value for 
(_Gen_dist_constraint_comment_org.ID), starting a new row.  But it is not the 
first term on a line.  Check the preceeding values to see if they match the 
number of values in the loop header.  (Only the first such error is reported 
for a table, to avoid a flood of messages.)
 (star.y:807)
WARN: at '[not printed]': line 19783, character 17337: Possible runaway value 
from line 19534:
    (This suspicion is due to the 'stop_' on line 19543.) (star.y:501)
syntax error (line 19786, col 26), reading '')
converting 1ba5 for wwPDB ftp
removing existing ./wwPDB_divided/ba/1ba5
[||] save_remove_some_restraint_saveframes
 Done
WARN: at '41': line 1317, character 18: There is a mismatch between the number 
of column names in the loop header versus the number of values in a row of 
text.  By counting terms, the value (41) should be the value for 
(_Gen_dist_constraint_comment_org.ID), starting a new row.  But it is not the 
first term on a line.  Check the preceeding values to see if they match the 
number of values in the loop header.  (Only the first such error is reported 
for a table, to avoid a flood of messages.)
 (star.y:807)
WARN: at '[not printed]': line 2509, character 17337: Possible runaway value 
from line 2260:
    (This suspicion is due to the 'stop_' on line 2269.) (star.y:501)
syntax error (line 2512, col 26), reading '')

Original issue reported on code.google.com by schulte....@gmail.com on 28 Jun 2010 at 3:21

Attachments:

GoogleCodeExporter commented 9 years ago
Erm not sure what to do with this - can you highlight where the problems are in 
the file? The messages are not that understandable to me.

Original comment by wfvran...@gmail.com on 1 Jul 2010 at 1:38

GoogleCodeExporter commented 9 years ago
Sorry for the late response. I've been out of the country.

I'm guessing the problem is at the '78' (line 18591) - note: you need to scroll 
way to the right on the original file to get to some of the values. This is for 
loop:

   loop_
      _Gen_dist_constraint_comment_org.ID
      _Gen_dist_constraint_comment_org.Comment_text
      _Gen_dist_constraint_comment_org.Comment_begin_line
      _Gen_dist_constraint_comment_org.Comment_begin_column
      _Gen_dist_constraint_comment_org.Comment_end_line
      _Gen_dist_constraint_comment_org.Comment_end_column
      _Gen_dist_constraint_comment_org.Entry_ID
      _Gen_dist_constraint_comment_org.Gen_dist_constraint_list_ID

    77   1.0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        264   67   264   71   rr_1ba5     1
    78   
;1.0 ;17bnew
OTHERS 
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;
;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                           265   67   268   90   rr_1ba5     1
;

Original comment by schulte....@gmail.com on 20 Jul 2010 at 7:01

GoogleCodeExporter commented 9 years ago
(Steve Mading here.)

Chris tried showing this problem to me but when I try the attached file in my 
version of the starlib2 parser it accepts it.

Then I noticed that the example shown in Chris's attachment does NOT match 
what's in the actual attachment file.  His comment #2 shows a file where on 
what would be about line 18596, there's a semicolon on the lefthand column all 
by itself - presumably the start of a value string that is cut off in the 
example.  The actual file 1ba5_linked.str in the attachment above does NOT look 
like this.

In the actual file, the relevant part looks like this, which parses just fine:

    78
;1.0 ;17bnew
OTHERS 
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;
;

                                                               265   67   268   90   rr_1ba5     1
    79   20NEW

I see how a file looking like the snippet Chris posted would be a problem - but 
the attached file is not such an example.  The attached file works fine.

Please look into why there's a discrepency between what Chris posted and what's 
in the attachment - it might have something to do with the problem.

Until I get a version of the file that has the actual problem, I can't diagnose 
this.

Original comment by Madi...@gmail.com on 20 Jul 2010 at 9:15

GoogleCodeExporter commented 9 years ago
It turns out that this was a problem with formatNMRSTAR3, the output from 
s2nmr, and the fact that EMBOSS uses semicolons as comment markers.

Thanks Steve Mading for fixing this one.

Original comment by schulte....@gmail.com on 21 Jul 2010 at 8:50

GoogleCodeExporter commented 9 years ago
Chris showed me the actual script and we deduced that what was going on was 
that the complaint was coming from further down in the script after the file 
had been altered some by the time it was fed into the parser.  That is why the 
line numbers of the complaint didn't match the file when I looked at it.

The actual problem is with starlib2, but it's with the output, rather than the 
parsing of the input.  The error comes from this:

STAR semicolon values look ugly because the first line is offset from the rest 
due to the formatting rules, as shown here:
;ABCD
EFGH
IJKL
;
(The ABCD is shifted off one column from where really is in the value itself.)
In order to satisfy needs at BMRB to have this made prettier, I always put out 
STAR semicolon strings with a leading blank line, and parsed them in ignoring 
that leading blank line, like so:
;
ABCD
EFGH
IJKL
;

The problem is that with EMBOSS files, the text chunks being put into these 
semicolon strings were sometimes EMBOSS comments, which start with semicolons 
themselves, leading to values like this in the STAR:

;;;This is an EMBOSS comment line;;;;
;

My un-parser was turning that into this on output:

;
;;This is an EMBOSS comment line;;;;
;

The problem then is that the first line of text data in the value starts with a 
semicolon itself, which then, the next time this is parsed back in, is 
interpreted to man that it terminates the semicolon string, making everything 
from there on way off and creating an off-by-one error on counting the 
semicolon terminators of semicolon strings (everything inside semicolon strings 
is actually outside semicolon strings, and visa versa).

To fix it, without messing up legacy applications that expect there to normally 
be prettily formatted semicolon strings, I made an exception that won't add the 
initial blank line when the first character of the value itself is a semicolon. 
 In that ONE case where it is incorrect syntax to make it look pretty, the 
prettiness is sacrificed.

Original comment by Madi...@gmail.com on 21 Jul 2010 at 9:00