Closed mougino closed 8 years ago
I knew it was too simple to work flawlessly ;) it breaks goto/gosub labels. Give me some time to fix it...
Ok, I think I got it. I updated my first post with correct Java code and BASIC! test snippet.
Very nice! I am astonished that it is so little code.
My line numbers differ from yours. Did you pull my latest commit? That was SHA1 43b7b4... "add enableDecryption parameter to getBufferedReader" that we talked about several hours ago.
We talked about multiple-command lines in the forum, here. I was certain it would have many problems. but you and Luca demonstrated that it could be done.
I have not tested all of the cases I listed on the forum. That will take a little time.
I did test the error highlighting feature, and it works. If there is an error on a line with multiple commands, the entire line is highlighted.
The only command I can think of that uses a colon is SQL.Update. I think that should not work. But I ran f09_sql,bas, which uses SQL.Update with a colon, and it works. I do not know why. I will have to try to figure it out later.
It is 3:30 AM. My alarm clock will wake me in three hours. I suppose I had better be in bed when it does.
Ah, I know what you're going through, I created the issue at 4AM ;) my brain never works better than during an insomnia, nature can be ironic...
SQL.Update uses a colon in a quote-protected string, yes? Colons in quote-protected strings are kept untouched, that's why I put one in my snippet print "i is: ";i
.
But hold on, my code doesn't work if there is a label followed by 1 or more spaces (or tabulations) and then a comment. It works only if the %
follows the colon directly. I will have to re-edit my first post with a fix for that.
I pulled the code 3 hours ago from the official repo. So yes it includes your latest commit from 15 hours ago. Maybe your local repo has been modified since then?
I think I will need your help Marc (after your rest and day of work, of course). I have to use a regex but they are really not my cup of tea. I need to replace this line from above:
else if (line.charAt(i+1) == '%')
with something like this:
else if (line.substring(i+1).replaceAll(whitespace, "").charAt(1) == '%')
I'm not sure about the replaceAll part, if whitespace is a valid regex or if some of its characters need to be escaped...
Also, this is starting to look heavy and might slow down the loop. Hopefully it only occurs when meeting a colon that is not end of line, but still... For starters we might want to create a Pattern
instance of the regex before the loop, see http://stackoverflow.com/a/4430220
I found the answer!
else if (line.substring(i+1).matches("[" + whitespace + "]*%.*"))
means any number of whitespace (0 or more) followed by a '%' followed by any character (0 or more)
First post above is updated, with corresponding code snippet. We now need to test thtat against all the test cases...
See the results from the test bench at http://rfobasic.freeforums.org/post19282.html#p19282
SQL.Update uses a colon in a quote-protected string, yes?
No. That's why I'm puzzled.
my code doesn't work if there is a label followed by 1 or more spaces (or tabulations) and then a comment
Ah. Yes, this is the same code that removes the whitespace, so it hasn't been removed yet! Your regex built around the whitespace
string is a neat solution. I wonder if we can use that to remove the whitespace, too, to save some loop iterations?
What happens if you have a colon followed by a lot of whitespace and then end-of-line? Will it still be recognized as a label? or was there a trim
somewhere that got rid of the trailing whitespace?
You're right, I smell a rat... A colon followed by a lot of whitespace and then end-of-line will fall into the "multiple-command-per-line" scheme and the colon will be dropped. Since the colon is not the last character, and since it is not followed by a '%' (regex is not matched), a line will be added with the label without the colon (so not a label!) and another line full of whitespaces will then be treated (so... ignored, no problem there). I just did the test, well done, this throws an Undefined Label "fin" at: gotofin (you don't see the many spaces after the label "fin" but they're there...)
goto fin
print "ignored"
fin:
print "fin"
I'm currently writing the whole test bed, as well as a launcher. I will add this test case as "M14" ;) (Marc test case #14)
We could use the regex principle to trim the white spaces (both leading and trailing), this would solve de facto the label-with-colon-followed-by-a-lot-of-whitespaces-then-EOL issue. The API to use is there: http://developer.android.com/reference/java/lang/String.html#replaceAll%28java.lang.String,%20java.lang.String%29
But then again, I suffered to make the regex above ;) I'm not sure how to do for this one. I know how to detect a label followed by white spaces then EOL: line.matches(".*:[" + whitespace + "]*")
but after that I don't know how to remove the white spaces...
SQL.Update mystery explained. It is a problem for your single-colon command separator. The command is:
SQL.UPDATE DB_Ptr, tbname$, c3$, "94" : Where$
The intent is to update column c3$ (which is "grade_average") to the value 94. The Where$ variable is a string that specifies only one row of the data base.
The multi-command code breaks that up into two lines:
SQL.UPDATE DB_Ptr, tbname$, c3$, "94"
Where$
The manual doesn't say it, but the Where$ parameter is optional, just like it is for SQL.Query. If omitted, SQL.Update updates ALL rows of the database. If you run the program, you see the result: the next section prints the average of "grade_average" from all the rows, and it is 94.
The second line, Where$
alone on a line, would have been a syntax error in any version of BASIC! before v01.85. In the new version, this creates a variable called Where$
and initializes it to an empty string.
I'm sorry; I think this is a fatal error for the single-colon separator. In the post I linked earlier, you suggested using a double colon, "::". That may be necessary now.
:) a little bit dramatic! The StringBuilder sb contains the current line, we just have to add a test case to not split at colon if sb startWith("sql.update"), no?
Ouch... yes, you're right, that should work well. It's a kluge, but so was the colon in SQL.Update. It's certainly less bad than the horror that came from integrating the preprocessor line continuation (good idea implemented badly) with Array.load
and List.add
continuation (narrow-focused solution to symptoms of a wider problem).
In general, special cases are bad. (What ironic wording!) But when there are legacy mistakes that we can't change, we just have to work around them. BASIC! has surprisingly few of those.
StringBuilder
doesn't have a startsWIth
method, but you still have the original line, too.
The original line is not lowered case, and still contains the white spaces and stuff...
There is no StringBuilder.startsWith(String)
but there is a if (StringBuilder.indexOf("sql.update") == 0)
[edit: sorry, 0 not 1, in Java indexes start at 0...]
Anyway that's just one case that we need to fix. I don't know if you had time to have a look at the test bench results at http://rfobasic.freeforums.org/post19282.html#p19282 (and 2 following posts) but there are several other problematic cases. I'll work on them tomorrow.
I made some progress.
First I used a regex to trim the leading white spaces (not possible for trailing, but we can do without it).
line = line.replaceFirst("["+whitespace+"]*", "");
elegantly replaces the initial loop:
for (; i < linelen; ++i) { // skip leading spaces and tabs
char c = line.charAt(i);
if (whitespace.indexOf(c) == -1) { break; }
}
Second, I solved failed test case M14 (a label followed by one or more whitespaces followed by EOL). It is now recognized as a label.
Third (I know you'll like this one), I solved failed test case M15 (f09_sql.bas). The colon in the SQL.UPDATE command is now correctly kept as part of the command.
Fourth, I solved some bugs due to the interaction between a colon :
and a line continuation character ~
(test cases M12 and N4) but some other test cases are still failed (N1 to N3).
The rule I adopted is that anything after a ~
on the same line (except for a comment of course) throws a Syntax Error. I hope this is acceptable?
I still have test case L2 failed: the formatter formats only the part of a line before the first colon (but the fix would have to be in the formatter code itself, BTW where is it located?).
So, good progress I guess ;)
[edit] After I wrote 'The rule I adopted is that anything after a ~ on the same line (except for a comment of course) throws a Syntax Error. I hope this is acceptable?' I looked at official BASIC! v01.85 behaviour.
My bad, sometimes BASIC! throws a Syntax Error (e.g. ~print 1
), and sometimes it throws an Extranous characters (e.g. print 1~+2
)...
So I chose to stick with the existing rules. I added an additional condition, and now all tests are passed :smile: I think we have a good candidate! I have updated the first post above with the changes. Now I'll have a look at the only remaining failed test: the formatter...
Fantastic, Nicolas!
Certainly throwing an error on anything after ~ is acceptable. There is no value in line 1~ : line 2
. I suppose we could handle "~:" just be removing both, but it's not worth any effort.
I wish I could make the "Syntax Error vs. Extraneous Characters" more consistent, but that would require a complete lexical analysis.
The formatter is in Format.java
. I started making changes in Format a few days ago, to fix quotes, change   to space, and to share code between Run and Format. I have working code, but I have been too busy to finish testing it. Do you want me to commit my changes? Also, you can commit your code changes without fixing Format, if you want to stage the work. Format doesn't format everything, but it doesn't break anything.
I'm good for the moment. I modified Format.java and I'm able to format multiple commands per line, but there's still a small issue with whitespace preservation that I need to fix, and I have to change indents. I'll work some more tomorrow and share my result before I commit.
You can anchor a regex to the beginning of the line with ^
and end of the line with $
. In fact, that may be necessary. I have not tested this, but consider our old friend SQL.update, and another example:
SQL.UPDATE DB_Ptr, tbname$, c3$, "94" : Where$
fmtString$ = "xyz : "
Your line.replaceFirst(ignored_lead_regex, "")
will remove the colon from the update command, and strip out part of the quoted substring in the second example. But if you anchor the regex, you'll only get spaces and colons at the beginning of the line:
String ignored_lead_regex = "^[" + whitespace + ":]*";
A similar regex can get a mix of trailing spaces and colons:
String trailing_space = "[" + whitespace + ":]*$";
but I can't think of a way to include a comment in the pattern, so it may not be useful.
No, no, line.replaceFirst(ignored_lead_regex, "")
will absolutely not remove the colon from the update command, it will remove any leading whitespace or colon before "SQL.UPDATE"
Marc, I ran the test with f09_sql.bas and it prints the same results as with official v01.85.
Interestingly, your example SQL.UPDATE DB_Ptr, tbname$, c3$, "94" : Where$ fmtString$ = "xyz : "
will work without problem with my implementation, but a side effect is that SQL.UPDATE must be the last command on the line. It cannot be followed by a colon and another command.
SQL.UPDATE DB_Ptr, tbname$, c3$, "94" : Where$ fmtString$ = "xyz : "
works, as well as PRINT "Updating SQL:" : SQL.UPDATE DB_Ptr, tbname$, c3$, "94" : Where$ fmtString$ = "xyz : "
, but SQL.UPDATE DB_Ptr, tbname$, c3$, "94" : Where$ fmtString$ = "xyz : " : PRINT "Updated"
throws an Extraneous character in line: print"Updated"
Ah, I see. I thought that the regex would find only whitespace and colons, but the regex also matches the empty string! So replaceFirst()
will not look beyond the beginning of the line. I'm sorry to send you "chasing the wild goose".
But now you have confused me.
SQL.UPDATE DB_Ptr, tbname$, c3$, "94" : Where$ fmtString$ = "xyz : "
should not work. It should get a CheckEOL error after Where$
.
Yes ;) but technically the regex does not find, the regex validates a match. It does it character by character in the target string. The replaceFirst is indeed the key, it makes the replacement of whitespaces and colon stop as soon as the regex does not match (so at first non-whitespace-non-colon character)
And I'm sorry again for the last confusion, I hurried to answer before bed and copied-pasted your example literally, but in my test I adapted it (in f09_sql.bas) with SQL.UPDATE DB_Ptr, tbname$, c3$, "94" : "first_name = 'xyz : ' AND last_name = 'Washington' "
(after renaming Tamasin Washington as xyz : Washington at the beginning of the program)...
Marc, would you have the explanation for the // Nasty stuff to deal with Switch
in Format.java:ProcessIndents(String)
? Or better yet, a test case, one from Michael Camacho originally used to test the formatter?
I don't see why a sw.begin
shouldn't act as an fn.def
or an if
; and why a sw.end
shouldn't act as a fn.end
or an endif
...
I refactored Format.java
but there is something wrong somewhere, if I could have your help to understand why, that could be solved faster I think.
The formatter works perfectly well, both on single command per line (retro-compatible) and multi-commands per line.
It works well the first time I launch it.
If I launch it a second time (e.g. right after first formatting), all the colons disappear and the indentation is all wrong for that reason.
I sent you my Format.java
by e-mail. If you can have a look at it (after filling your tax forms, of course :wink:) that would be great, thanks.
And of course, just after posting, I do new tests that show it doesn't work so perfectly well :stuck_out_tongue_winking_eye:
Labels alone on a line get stripped off their colon, and single-lined IF cond THEN action
add an indentation at the next line, while they shouldn't...
Oh well, back to fixing then. But the file I sent you is a good start, particularly the architecture fits to both single and multi-commands per line (with the new private int CommandOccurrences(String line, String command)
).
Hi Nicolas. You have been working hard. I have, too, but not on BASIC!. I normally would have tried to help at lunch, but I was off-site for two hours today.
I was not aware Michael made changes in the Formatter. I see the comment at the top, but I don't know what parts he touched. If there are any test cases, they would be on Paul's server. I admit I have not looked.
I'm dodging your question about switch-indenting until I have time to learn the code. You have probably figured it out by now.
There are outstanding bugs in how Format handles single-line if statements. I have never tried to fix them, which leaves me ill-prepared to help you now. In your new tests, you may be seeing the results of old bugs.
I'll go fetch the code you sent me, but I will not be able to do any serious work until I get home. Send me what you have done when you quit for the night, and I will try to build on it.
To tell the truth, I am still catching up with your other single-line changes. It took all the time I had yesterday to catch up 100+ forum posts. There were 72 on Monday alone! To do any real development I have to stay out of the forum. When I return, catching up is very difficult. We have 60 or more posts four times in March, 50-59 five more times. I really cannot handle this volume.
But that is off-topic!
Yes the formatter is arty-crafty. I found that run
is not capitalized in if cond then run "prog.bas"
or that in the following example the indentation fails, and worse the content of the string is capitalized! (because it contains then
):
if a$="Athens"
print "Greece"
endif
This is a rudimentary implementation. I'll work on it some more today.
Nicolas, my workspaces are getting cluttered. I have committed and pushed changes to Run.java
and SUReader.java
. I think you have not changed those files, so you can do a pull from RFO-BASIC.
I also have changes in Format.java
that I have not committed. They do two things:
TestAndReplaceAll()
, the method used to capitalize built-in function names.The fixes use a new method, skipQuotedSubstring()
. Is there a regex that can identify a quoted substring, keeping any embedded quotation marks in the string?
I wanted to use regex in Java.com:getWord()
, but there I would need to apply a regex in the middle of a string. I don't know how to make it skip to a given starting position.
If you think my Format changes would help you, I will commit them. However, your Pattern
and Matcher
and your CommandOccurrences()
method may be better for TestAndReplaceAll()
than what I did. Too tired to analyze it.
When I was reading your code, I thought, "What if the parser ran a pattern-match for all of the words in the command line?" All of the rest of the parser would use the Matcher.find())
method to get the next word. Just gotta figure out how to get the stuff between words. The mind boggles!
Ok, please push your changes to Format.java
, I'll merge with my code, no worries.
I'll investigate on a regex to identify a quoted substring, but I think it is doable. When you say identify, you need get a 'match' boolean, or a little more, such as the length of the quoted substring?
Worse case scenario, we take the position of the first quote, and use substring() then do the regex.
I agree regexes make my mind boggle too. I should find some time to make progress today I hope.
I found the reason for the %!#$@ bug that made consecutive formattings drop the colon (and sometimes the white spaces), and I'm not happy about it :angry:
The reason was the Java method String.substring(int start, int end)
. As its name (and short description on Android developer) indicates, it "returns a string containing the given subsequence of this string". I took it for granted that it returned a substring of all characters between the position start
and the position end
included. Alas! after 24 hours of many failed attempts, I got back to Android developer, looked this time at the long description at the bottom of the page and saw this :scream:
Parameters start the start offset. end the end+1 offset.
This last line and the fact that someone somewhere decided to call a variable end
which actually has a value of end+1
made me doubt of humanity for a minute :astonished:
Hi Marc, I'd love to benefit from your modifications in Format.java
:
TestAndReplaceAll()
where the method used to capitalize built-in function names (BTW what do you mean by "built-in"?)skipQuotedSubstring()
which I specifically need to fix the if a$="Athens"
bug (which takes place in TestAndReplaceFirst()
this one) and I'd like not to reinvent the wheel :wink: I am waiting for your commit to synchronize my repo and finalize the single/multi-command formatter (at last!) thanks!
Ah, good catch. The class library is wonderful to have and sometimes very frustrating to use. I'm sorry I was not more helpful.
I have pushed my Format changes. I had to do a little clean-up. I hope they are useful to you and the merge is not too bad. And again sorry for not keeping up with you! You are really "on a roll" -- moving fast and getting a lot done.
"Built-in" functions are the math and string functions defined in the language. I use the term to exclude user-defined functions.
NB: skipQuotedSubstring()
returns the index of the closing quotation mark. It does NOT return the character after the quote, as you found with String.substring()
.
The new code in TestAndReplaceAll()
was very hard to write, but it is short and I hope it is not hard to read. It is like Paul's code, in that it uses indexOf()
to find instances of the keyword, but it also uses skipQuotedSubstring()
to find instances of quoted string. The logic has to move the indices forward, skipping keywords when their indices are inside quoted strings. I kept Paul's trick of looking at the character before the keyword to see if it is embedded in a variable name, but I used the new Run.isVarChar()
to check the character.
A Matcher
could find all the strings at once, then use find()
and start()
to step through them. I'd like to experiment with that, and measure its performance.
Nicolas, you probably already know the code I pushed is broken. I am sorry!
I hope it did not take you too long to find the error. In skipQuotedSubstring()
,
change for ( ; ci < size; --ci) {
to for ( ; ci < size; ++ci) {
I do not know how that happened. I am worried: if there is a mistake that silly, there may be others. This is clearly NOT the code I tested!
Speaking of testing, here is a case that fails with old code (like v01.85).
print "value of \"cos(cos)\" is "; cos(cos)
There is a more subtle error in the same method. In this line print "say \"me\""
it looks for the \
before the "
. It does not work with this line: print "say \\"; me
I'll try to fix the subtle bug before committing a change to fix the stupid bug.
Okay, got it. Fixed and committed. Again I forgot to put the item number in the commit comment.
I also uncluttered another workspace: this commit added a "Load and Run" menu option to the Editor.
This feature, especially the Format part, has gotten large enough to justify doing the development on a branch. Too late now, I think, but maybe we should consider it next for the next collaboration?
If not a branch, then perhaps I should commit my changes only to my fork; then you can pull the ones you want. I make too many mistakes when I hurry too much.
Wonderful work Marc ;) you've been busy as well.
Git won't let me pull your changes untill I commit mine (in AddProgramLine.java
and Format.java
) so I'll have to manually merge your last changes...
I hope to be able to push both finalized files tonight French time (I'm AFK for the week-end).
Question: how do one push to his local repo in command-line? If I do a simple git push
it pushes directly to RFO-BASIC/Basic
instead of mougino/Basic
...
And now that we can do Load and Run in the Editor, have you planned the same feature as a command? E.g. the equivalent for QuickBasic CHAIN "prog.bas"
?
(sorry I wanted to comment this elsewhere but I didn't find a dedicated issue for Load and Run)
Simple git push commits to whatever repo you originally cloned. I clone my fork, not RFO-BASIC. Then I set up RFO-BASIC as a different remote. I took it right out of the github documentation, I think.
clone https://github.com/jMarcS/Basic.git
get remote set-url upstream https://github.com/RFO-BASIC/Basic.git
To refer to my fork, I use git push
and git pull
.
When I want to refer to the main repo, I use git push upstream master
. upstream
is the alias of the repo (from the remote set-url command). master
is the name of the branch. (I don't set a branch name. master
is the default name.)
Typically, I push to my fork first, then to upstream master. When I pull, git pull
pulls from my fork, and then I push it up to upstream master. When you push to RFO-BASIC, I do git pull upstream master
to get your changes, and then push it to my fork with simple git push
.
When you have changes that block a fetch, you can use git stash
commands to temporarily hide your changes, then apply them to the code you fetched.
If I ever knew there was a request for a CHAIN, I forgot it. That's why I have lists, I guess. How is it different from RUN?
Sorry, I forgot one bit, although you've probably figured it out already. GitHub recommends setting your fork as the default, and the main repo as the "upstream". You don't have to. Since you have set RFO-BASIC as your default, you can set your own fork as the upstream. The names don't mean anything to git.
Run
doesn't load the program in the Editor. So if there's a Syntax Error for example, it is reported but back in Editor you can't see it. Or if your program naturally ends, what you see after is the previous program that ran it.
This is not a bad feature per se, that's what allows me to go through all the test suite with a single launcher: every time a test case finishes, I'm back to Editor with the launcher in it, and I just have to hit Run to select the next test case.
But I guess developers could be offered the choice.
Marc, do you have an idea why in v01.85, if i then run "prog.bas"
is correctly formatted to IF i THEN RUN "prog.bas"
but if cond then run "prog.bas"
is incorrectly formatted to IF cond THEN run "prog.bas"
? (run
not capitalized in second case)
This is just for curiosity, I don't really need the answer since in my latest code the formatter doesn't capitalize run
in both cases (so far)... It actually is the latest bug (to my knowledge) that I need to fix and I think we'll have a pretty great formatter compared to the old one ;)
Also, I hope you don't mind, since TestAndReplaceFirst()
and TestAndReplaceAll()
do 99% of the same common work, after I started to duplicate what you did in All
to do the same in First
I changed my mind and renamed your method + added an additional parameter: I made it TestAndReplace(boolean stopAtFirstKw, String kw, String lcLine, String actualLine)
.
The only alteration to the inside of the method is after the actual sb.replace(k, k+kl, KW);
(in if (!embedded) { ... }
) I added a simple if (stopAtFirstKw) return sb.toString();
Now both TestAndReplaceFirst()
and TestAndReplaceAll()
call the same TestAndReplace()
, first one with parameter stopAtFirstKw
at true
, the other one at false
.
Ok, I got it. Now all the BASIC! keywords are correctly capitalized including those following a THEN
.
I made a specific test in ProcessKeyWords()
after the capitalization of the leading keyword (=after StartOfLineKW
): if the line starts with IF
(matches the regex "["+Basic.whitespace+"]*IF.*"
) and contains THEN
(as is, i.e. capitalized) then I do a TestAndReplaceAll()
of all the Run.BasicKeyWords[]
, in a loop (the same way it is done for the Run.MathFunctions[]
and Run.StringFunctions[]
right after).
Also, appart from the formatter, as you have seen in my regex above I shared the String whitespace
that we've been using in AddProgramLine.java
and in Format.java
, I made it a public static String
inside Basic.java
so it can be referenced anywhere with Basic.whitespace
.
One last fix and I'll be done: I found that your new TestAndReplaceAll()
doesn't stop when it encounters an end-of-line comment marker... So any keyword after a %
will be capitalized. I'll fix it, it should be easy enough.
I did some more refactoring that I am really happy with. Going further than the TestAndReplace()
shared by TestAndReplaceFirst()
and TestAndReplaceAll()
, I rather put the intelligence of finding a non-quoted, non-commented, non-variable-name-embedded keyword into a method private static int FindKeyWord(String kw, String line, int start)
, it returns -1 if no valid keyword is found, or its index if found.
I call this method from TestAndReplaceFirst()
with a simple if FindKeyWord() >= 0
(to treat only first instance) ; and in TestAndReplaceAll()
with a more complete while
loop (to treate all valid instances). But the good part is I can call it from ProcessIndents()
too now! This way only 100% valid keywords (not in a quote, not after a %
, not part of a variable name) influence the indentation. Neat!
TestAndReplaceAll()
with comments. Thank you for catching the error. My mantra is, "If it ain't been tested, it don't work!" (Yes, it's bad grammar, intentionally - that makes it just a little more likely that people will remember it.)This is great stuff, Nicolas. You have fixed problems from bug reports that are years old. I think you have fixed bugs nobody has found yet!
Beyond bugfixing, you have "modernized" the code. I am always looking for ways to make BASIC! faster and more efficient, but also more understandable and maintainable. Using Java library classes, sharing code, reducing dependencies, using classes and OO principles.
Some day, I will almost certainly make the BIG change that Michael Camacho was asking about a few weeks ago. Break up Run.java. Maybe run the interpreter as a service. Make BASIC! a true Android app. None of that will be possible without first making these infrastructure changes.
I was able to implement multiple commands per line using a colon as a separator.
Modifications only take place in
AddProgramLine.AddLine(String)
The changes are as follow:The BASIC! test cases are zipped into a test-suite + launcher at http://laughton.com/basic/programs/beta-test/multiple-commands-per-line-test-suite.zip (to be unzipped in rfo-basic/source), all tests except for the formatter are "passed".