Closed GoogleCodeExporter closed 9 years ago
Erm not sure what to do with this - can you highlight where the problems are in
the file? The messages are not that understandable to me.
Original comment by wfvran...@gmail.com
on 1 Jul 2010 at 1:38
Sorry for the late response. I've been out of the country.
I'm guessing the problem is at the '78' (line 18591) - note: you need to scroll
way to the right on the original file to get to some of the values. This is for
loop:
loop_
_Gen_dist_constraint_comment_org.ID
_Gen_dist_constraint_comment_org.Comment_text
_Gen_dist_constraint_comment_org.Comment_begin_line
_Gen_dist_constraint_comment_org.Comment_begin_column
_Gen_dist_constraint_comment_org.Comment_end_line
_Gen_dist_constraint_comment_org.Comment_end_column
_Gen_dist_constraint_comment_org.Entry_ID
_Gen_dist_constraint_comment_org.Gen_dist_constraint_list_ID
77 1.0 264 67 264 71 rr_1ba5 1
78
;1.0 ;17bnew
OTHERS
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;
;
265 67 268 90 rr_1ba5 1
;
Original comment by schulte....@gmail.com
on 20 Jul 2010 at 7:01
(Steve Mading here.)
Chris tried showing this problem to me but when I try the attached file in my
version of the starlib2 parser it accepts it.
Then I noticed that the example shown in Chris's attachment does NOT match
what's in the actual attachment file. His comment #2 shows a file where on
what would be about line 18596, there's a semicolon on the lefthand column all
by itself - presumably the start of a value string that is cut off in the
example. The actual file 1ba5_linked.str in the attachment above does NOT look
like this.
In the actual file, the relevant part looks like this, which parses just fine:
78
;1.0 ;17bnew
OTHERS
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;
;
265 67 268 90 rr_1ba5 1
79 20NEW
I see how a file looking like the snippet Chris posted would be a problem - but
the attached file is not such an example. The attached file works fine.
Please look into why there's a discrepency between what Chris posted and what's
in the attachment - it might have something to do with the problem.
Until I get a version of the file that has the actual problem, I can't diagnose
this.
Original comment by Madi...@gmail.com
on 20 Jul 2010 at 9:15
It turns out that this was a problem with formatNMRSTAR3, the output from
s2nmr, and the fact that EMBOSS uses semicolons as comment markers.
Thanks Steve Mading for fixing this one.
Original comment by schulte....@gmail.com
on 21 Jul 2010 at 8:50
Chris showed me the actual script and we deduced that what was going on was
that the complaint was coming from further down in the script after the file
had been altered some by the time it was fed into the parser. That is why the
line numbers of the complaint didn't match the file when I looked at it.
The actual problem is with starlib2, but it's with the output, rather than the
parsing of the input. The error comes from this:
STAR semicolon values look ugly because the first line is offset from the rest
due to the formatting rules, as shown here:
;ABCD
EFGH
IJKL
;
(The ABCD is shifted off one column from where really is in the value itself.)
In order to satisfy needs at BMRB to have this made prettier, I always put out
STAR semicolon strings with a leading blank line, and parsed them in ignoring
that leading blank line, like so:
;
ABCD
EFGH
IJKL
;
The problem is that with EMBOSS files, the text chunks being put into these
semicolon strings were sometimes EMBOSS comments, which start with semicolons
themselves, leading to values like this in the STAR:
;;;This is an EMBOSS comment line;;;;
;
My un-parser was turning that into this on output:
;
;;This is an EMBOSS comment line;;;;
;
The problem then is that the first line of text data in the value starts with a
semicolon itself, which then, the next time this is parsed back in, is
interpreted to man that it terminates the semicolon string, making everything
from there on way off and creating an off-by-one error on counting the
semicolon terminators of semicolon strings (everything inside semicolon strings
is actually outside semicolon strings, and visa versa).
To fix it, without messing up legacy applications that expect there to normally
be prettily formatted semicolon strings, I made an exception that won't add the
initial blank line when the first character of the value itself is a semicolon.
In that ONE case where it is incorrect syntax to make it look pretty, the
prettiness is sacrificed.
Original comment by Madi...@gmail.com
on 21 Jul 2010 at 9:00
Original issue reported on code.google.com by
schulte....@gmail.com
on 28 Jun 2010 at 3:21Attachments: