Closed mougino closed 8 years ago
Thanks for posting your test results. (And for telling me how to do tables!) The results look pretty consistent.
There must be some misinformation floating around out there. I went to the authoritative source, developer.android.com/reference
, and it looks like we're okay. This is a Good Thing; I didn't test on a Froyo emulator before committing.
There are nine flavors of BinarySearch
in the Arrays
class. Each of those has two variants. The full-list variants have been around since API 1. The sublist variants showed up in API 9. From the Arrays page:
public static int binarySearch (double[] array, double value) Added in API level 1
public static int binarySearch (double[] array, int startIndex, int endIndex, double value) Added in API level 9
In the Collections
class, there are only two signatures, and they've both been there since the beginning. From the Collections page:
public static int binarySearch (List<? extends Comparable<? super T>> list, T object) Added in API level 1
public static int binarySearch (List<? extends T> list, T object, Comparator<? super T> comparator) Added in API level 1
Yes, I got Arrays and Collections mixed up. Putting aside the b.search, another theory I have could be the effect of 'getter/setter' inline optimizations in the Dalvik engine after Froyo. http://stackoverflow.com/questions/4912695/what-optimizations-can-i-expect-from-dalvik-and-the-android-toolchain/4930538#4930538 This might make a linear search faster after Froyo, although I don't know if you can optimize VarNames.get(j). Anyway, I think I'm getting off topic now.
That link made interesting reading. Thank you.
There are two linear searches. No, were two linear searches -- Nicolas's change makes variable name searches binary. That's because a variable name can be extracted from the command line as a discrete token, then the token can be used in a name table lookup. I had tried a hash table, but I needed a new table for each level of function call. Nicolas found that BinarySearch can be applied to a sublist, and that has made all the difference.
The command keyword search is still linear. That's because the keyword can't be isolated as a token. Instead, the beginning of each line is compared to every entry in the command keyword table (stops when a match is found). (I know, I've said this before, but it's a necessary preface to what comes next).
That compare looks like this in pseudocode:
for each keyword in the table
if command line starts with the keyword
table hit
The Java method is String.startsWith(...)
. And it is fast. Blindingly fast. Faster than it has any right to be. So fast that everything I've tried to make it faster makes it slower!
So I quit trying. Instead, I changed the data structure that holds the commands so the keyword search happens only once per line. Originally, each line was just a Java String
, and it got fully parsed every time the interpreter saw it.
Now I have a table of command functors, one for each command keyword. The first time the interpreter sees a line, it identifies the keyword and stores a reference to the functor. Each command line is an object that holds the original String
, a functor reference, and the length of the keyword. When the interpreter sees the command again, it doesn't parse anything. It grabs the functor reference, skips ahead as far as the length field says it should, and starts the command.
I don't remember the speed-ups that got. Also, I did it in two stages. The first stage did non-compound commands (no dot). The second stage (much harder!) handled compound commands (commands in groups, like GPS.latitude
or GR.bitmap.load
. I've never tried to measure the combined effect.
If I remember right, I was a little disappointed in the effect. Expression parsing is a bigger concern, and variable name lookup is the slowest part of that.
The first stage went into v01.85. The second went into v01.88. The BinarySearch change will be in v01.89. There are hundreds of other changes between v01.85 and now, but it would still be interesting to compare benchmarks.
And now that the "search" part of variable-name handling is faster, I'd like to do some profiling to see where the next hang-up is. Probably the string extraction. If that's the case, the next speed-up will be tough -- pre-parsing the line, saving the tokens, maybe the expression tree? Putting even more "compiler" into the "interpreter".
I have to wonder if the time would be better-spent trying to find the massive memory leak. Or getting the interpreter out into a Service where it belongs.
But now I'm WAAAAAY off-topic!
Okay, back on-topic. Gentlebeings, we have a bug. f25_dir.bas
fails. You're gonna love this (note the sarcasm, please).
UNDIM
(same as ARRAY.DELETE
) works by replacing the variable name with a space character. (searchVar
does the same thing, to propagate the effect of UNDIM
on an array passed into a function.) The intent was to write an invalid variable name so that future linear searches would skip it. This "removes" a variable without changing the index of any other variable. Then DIM
(or something else) can make a new variable with the same name and a new index, and the linear search would find that.
With BinarySearch
, writing over a valid variable name with a space is evil: the list is no longer in alphabetical order. Furthermore, multiple UNDIM
s would create multiple variables with the same name (" "), which is not allowed with BinarySearch
.
On the other hand, with BinarySearch
the variable indices can change every time we add a new variable. Nobody cares, because the index values are not directly accessible. Every time the variable is looked-up, the search gets the current index.
I thought that might mean deleting variables is okay now. That could be a nice clean fix. But interrupt code can UNDIM
an array in the wrong namespace, invalidating all of the VarSearchStart
values saved in the call stack.
I thought we could "rename" the variable by cloning it with a new name, inserting it with BinarySearch
, and deleting the old one. We'd have to create a naming scheme that would avoid duplicate invalid variable names. Besides, this is an opportunity to right an old wrong: adding unused variables to the list just slows everything down.
We could fix this by adding bookkeeping to the array descriptor. UNDIM
would not change the variable, but any variable look-up that found an UNDIM
ed array would report the variable does not exist. An attempt to create a new array with the same name would not change the variable either, except to point it at a newly-created array descriptor. It's a little complicated, but I haven't thought of a reason it would not work. And it removes the extra step in searchVar
to propagate UNDIM
out of function calls. And it might be movement in the direction of separating arrays from scalars -- not sure about that one, it's hard to think about.
Any other ideas? I'd like clean and simple, but I'm moving in the opposite direction.
I can sympathise, it's one of those programming moments that makes your stomach turn inside-out.
If I read you right these are the following scenarios with UNDIMed vars (uvars);
A) Pretend nothing happened.
Clearly does not work as replacing uvars with a 'space' screws up b.search.
Not an option.
B) Replace uvars with a naming scheme that moves uvars to either the top
or bottom of the list.
(without invalid duplicates)
Inventing a scheme e.g " myname123" can prove messy.
An easier way, would be to just insert a space " myname"
If the user or system tries to DIM "myname" again, then
" myname" can first be searched for and " myname" replaced with "myname"
to be re-used. In a way it is similar to the BookKeeping method.
(with invalid duplicates)
I'm not yet convinced that Inserting or replacing the name with a space
AND moving it to the top of the list would neccessarily screw up a
b.search.
This method though does not 'right an old wrong',
you still end up with duplicates, albeit 'safe' duplicates.
For both cases (invalid or valid) I'm not sure how it affects undim
propagation in functions.
C) BookKeep with array-descriptor, i.e marking uvars, making uvars re-usable.
Simplest option ?
D) Really delete variables.
This option only works if VarNames, VarIndex and the various VarSearchStart's
are all synchronised properly.
This means that after deletion, UNDIM must somehow scan the call stack entries
and correct all the VarSearchStart's depending if a particular VarSearchStart
was before or after the deletion (can this be done?).
I'm on smartphone, so I may have read Marc's post in diagonal, however: how about removing the arraylist element in VarNames and VarIndex (only) altogether, instead of filling with a space? Wouldn't that work?
Ah right, I read a little more in detail... How about throwing an error "Cannot undim in an interrupt" then? It's a new limitation but would solve all of our problems :white_check_mark:
Yes, Nicolas, that's a very tempting suggestion.
Humpty, I think your recap is mostly right on. I looked at Option C a little, and it's worse than I thought. For Option B, as you suggest, there's probably a way to deal with the duplicate variables; true duplicates would have to be excluded from the sublist being searched.
It annoys me that we have to pretzel the code to account for array names! There's got to be a better way.
But -- I've got to get the test build out so users can check out both the speed-up and the new Paint features. Maybe the right answer is to do quick-and-dirty for now and fix it right when we have more time.
Quick-and-dirty would be Option D with a mouginish twist:
VarNames
and VarIndex
.UNDIM
, if the index of the variable being deleted is smaller than interruptVarSearchStart
, throw an "undim in interrupt" error. (This will have to be cleaned up a little; I was sloppy with interruptVarSearchStart
.)This is really easy.
If we do it, it will lean us toward deletion as the permanent solution. Walking the call stack and decrementing each saved VarSearchStart
is easy, but it feels awfully kludgy. I want to make the code less fragile, not more!
If we could have different variable lists for each function call, the call stack would save a reference to the variable list instead of the start of the sublist. Can't do that until the variable list is a single list. But maybe we could leave the kludge in place until then?
Does UNDIM do the propagation? OR does it occur at FN.RTN/FN.END ? i.e After function2 UNDIMs, at which point does function1 know when to UNDIM it's own variable ? after the exit of function2 or before ?
Neither. UNDIM
marks the array descriptor invalid and clears the variable name. The function returns and the interpreter goes on running. If it gets an instruction that references that array, the variable lookup finds the invalid array descriptor and clears that variable name, too.
It's in SearchVar()
, there where it is says if (VarIsArray)
.
I see. The bookkeeping is sort of already there. Then I think this would work. The ony thing is that you can't undim a 'main' variable in an interrupt, there might be some negative feedback from users who have done this (can't be many). (btw I don't see the kludge as a 'kludge', as it ensures call stack integrity)
An alternative quick fix is to 'always' re-insert a space at index 0 (Option B-b) if index < interruptVarSearchStart, but that leaves the problem of unused entries).
Nicolas reported in the forum that the GW-demo doesn't work with the v01.88.03 test build.
First look: we're running into trouble with all those global variables: theVarIndex
, VarNumber
, etc. For example, in getArrayValues()
:
int avn = VarNumber; boolean avt = VarIsNumeric;
do { evalNumericExpression(); [...] } while (isNext(','));
VarNumber = avn; VarIsNumeric = avt;
That is, evalNumericExpression()
is going to change VarNumber
, so grab a copy. Then, after evaluating the expression, restore VarNumber
.
Yeah, that was fine when a variable's index in the variable lists never changed. Now creating a new variable can move any or all of the other variables. Saving/restoring the index doesn't work any more.
For example, this crashes:
a=1 : c=3
array.load b[],11,12,13
d=b[a + b] % d=b[a] does not crash
How many places are there like this in the code, where it assumes that a variable's index never changes? I don't know, but it looks like I'm going to have to find out.
That commit I made a few minutes ago says "Tiny fix in GetArrayValue()". I moved one line, so it gets the ArrayDescriptor
before evaluating the index values. This way it doesn't care if VarNumber
gets restored correctly, and my little code snippet doesn't crash.
Of course, VarNumber
and VarIsNumeric
still are not getting restored correctly. I suppose that's a problem, but I haven't got a small test case to prove it.
The fix in GetArrayValue()
does not fix the problem Nicolas found testing with GW Lib. Or maybe it does fix one problem, but GW_demo
still crashes. The error in the stack trace is "Index out of bounds". It's trying to reference Vars
, and the index it's using is the same as the size of Vars
.
I tried taking out the UNDIM
change, and it made no difference. The UNDIM
change removes an item from VarIndex
, not from Vars
.
I could, and probably will, go on trying to find the cause of that indexing error. But in the meantime, I'm starting to prepare for a more stable fix. The first step is in that same commit: "Changed Var to Val and Vars to Vals". That's the class Var
and the ArrayList<Var> Vars
. The terminology is very confusing: things I've been calling vars in the code are really objects that hold the value of a scalar. The change looks very large: 544 additions, 537 deletions (counting the insignificant change in GetArrayValue()
). It was mostly mechanical, global search and replace. It has no effect at all on the program logic, but it makes the thinking logic easier.
With that name change, I can add a new class called Var
that will hold the name and type of the variables -- yes, I intend to replace VarNames
and VarIndex
with a single list of Var
objects. VarIsNumeric
, VarIsArray
, VarIsFunction
, and VarIsNew
will all vanish, and good riddance. And each Var
object will hold a reference to a Val
(for scalars), an ArrayDescriptor
, or a FunctionDefinition
-- just the same as VarIndex
does now, except it will hold references, not indices. That may (should?) mean we can delete scalars. And UNDIM
ed array space.
I haven't thought through what this does to binarySearch()
, yet. Have to add a Comparator
, and then see if that slows down the variable search. I don't want to undo Nicolas's gains!
This is not going to look exactly like the "Grand Unified Variable" architecture I've been working toward. I just have to hope it's a little easier to implement than that would have been.
Some weeks ago Marc wrote about a database approach. Now I'm thinking. Marc looks more and more in this direction.
And now I know why. Now I now why the guys from CA (Realizer-Basic) chose another approach.
To fence the spaghetti code of GW-Basic they made it simpler as VBASIC.
First all variables are global except which are in function or procedure calls or are declared as local in them. (In Basic! you could call „GlobalsOn“.)
Functions or procedures have to be on top of the code except commands like the RESET command.
So the Realizer approach need only one reference table for (keywords, operators,) variables, functions and procedures, but no different name space.
OK, hit me, in this example I use name extensions.
FUNC MyFunc (s) REM Only single declaration allowed.
LOCAL b, c
a = a + 5
c = 1
b = s + c + a
p = „Test“
RETURN b
END FUNC
s = 10.3
c = 6/4
TYPE a AS Integer REM Only single declaration allowed.
a = INT(2.0)
p = 99
PRINT MyFunc (33.0) → 41.0
PRINT s, c, a → 10.3 1.5 7
PRINT p → Test
Reference table records step by step: (The first item is the key.)
MyFunc | Function | line1| ? | ? MyFunc#s | ? | Free | ? REM Swap s with MyFunc#s. MyFunc#b | ? | Free | ? MyFunc#c | ? | Free | ? a | ? | Free | ? MyFunc#c | Integer | Free | 1 MyFunc#b | Integer | Free | ? p | String | Free | „Test“ MyFunc#RETURN | ? | Free | ? MyFunc | Function | line1 | line8 | ? s | Double | Free | 10.3 c | Double | Free | 1.5 a | Integer | Fixed | ? a | Integer | Fixed | 2 p | Integer | Free | 99 MyFunc | Function | line1 | line8| used MyFunc#s | Double | Free | 33.0 MyFunc#b | Double | Free | 41.0 p | String | Free | „Test“ MyFunc#RETURN | Double | Free | 41.0
The reference table could be :
Key ((keyword, operator,) function name or variable name) | Type (function, keyword, integer, double, string, bundle, array of strings …) |
---|
Variable type fixed? | Adress or Value ( small values, pointers, branch destinations )
This table is a case for binary search or may be faster Java search methods.
With your own binary serarch you get the point on which you can split the table, if you want to insert an new keyword. So the table is in the right order without starting a new sort process. (See my Binary Search enhancement at the rfo-basic-forum also. http://rfobasic.freeforums.org/post25410.html#p25410 )
Some remarks: Realizer uses bundles like a.Name, a.PostalCode, a.City, … . The most variables are able to change there types at runtime!
As a beginner you hate it, then you love it and later you use the LOCAL command consequently.
If the source code has reached a larger size and for every include file, Realizer generated an object code at first runtime and save it on the disk. I suggest *.bao as a file name extension. Ballast like unused functions or remarks could be thrown away. The reference table could stored in in this file, too.
I do not claim, this approach is better, but he is different and well-considered.
On the other hand, some months ago I was looking in the Java subs. So I seam to remember, all the variables are stored in Java as text. I´m right?
I hope this post gives some inspirations.
Gregor
„Make things as simple as possible, but not simpler.” (Albert Einstein)
I can't begin to describe what I've been doing for the last nine days. Some day I will have to do that, but not now. While testing those changes, I found a very small but very important error in the original "binary search" commit:
// Now that all new variables have been created in main name space,
// start the function name space with the function parameter names.
sVarNames = VarNames.size();
for (FunctionParameter parm : parms) {
// Get insertion point (-index - 1) so VarNames.subList will still be in alphabetical order.
int index = newVarIndex(parm.name(), VarNames, sVarNames);
That last line is the common function that uses binarySearch
to find the insertion point of a new variable in the (sorted) variable list. The error is painfully obvious in hindsight: we're capturing the end of the caller's variable space in sVarNames
, but we don't change VarSearchStart
until a few lines later. Oops!
sVarNames = VarNames.size();
VarSearchStart = sVarNames;
for (FunctionParameter parm : parms) {
int index = newVarIndex(parm.name(), VarNames, sVarNames);
This fails in any program that uses a function parameter name that matches the name of a variable that already exists in the caller's name space. It's amazing that so many tests worked!
I'm committing a change that fixes this little oversight. It also re-enables hardware graphics acceleration in HTML mode, as Nicolas requested over in Issue #196.
Unfortunately, this commit still doesn't make the original binary search speed-up code work. GW_demo
still gets an IndexOutOfBoundsException
just off the end of the Vals
list.
I have a candidate for check-in. It runs GW_demo.bas. It runs all of the Sample_Programs (except I didn't run f32 or f35). That's not nearly enough.
My changes touched almost every operation in the code. They affect every command. Ideally, every command must be tested, including tests for syntax errors. Needless to say, that isn't going to happen. It's next to impossible.
If I had any sense at all, I would have tabled Nicolas's binary search code a week ago. But I didn't, so now we have to make a choice.
I fixed the errors that were easy to find. Now it gets harder. If we could get some forum users to run their favorite programs, we can debug it well enough to release it.
I really want to release this code. I have put too much effort into it to drop it. But we must get v01.89 out, and there's not enough time to test the new code as it should be tested.
What do you you think?
(BTW, I have not been in the forum since 12/4. I am waaaay behind. Anything you post there I will not see for a while.)
Wow, thanks Marc for your efforts! I will be interested to know what the fix was eventually. In the meantime I think the forum is a great place to have your fix tested. Many active users for many kinds of programs.
I don't know what is your timeline for v01.89, if you really want to see it out asap I would suggest getting rid of the binary search and releasing it on the Google store with the rest of the changes. Then we will have all the time to test a v01.89.01 with only the binary search.
Yeah, well, my timeline for v01.89 was mid-October. Didn't make it!
I think the only change I have not yet made that absolutely must go into v01.89 is the Save path bug.
Let me look at how to back out the binary search while still keeping the Paint and other recent changes.
I will attempt a brief explanation of the changes. Explaining them is not hard. Keeping it brief is hard!
The values list (Vals
) is gone. The globals VarNumber
, theValueIndex
, VarIsNew
, VarIsNumeric
, VarIsArray
, VarIsFunction
, ArrayTable
, ArrayValueStart, 'FnDef
, WatchVarIndex
are gone. Function names are global, so I had to keep FunctionTable
-- one step removed from having global variables!
VarIndex
was a list of indices into other value lists (Vals
, ArrayTable
, FunctionTable
). That's gone, too, replaced by ArrayList<Var> Vars
. Vars
is a list of variables. Each Var
object is a complete variable: a scalar, an array, or a function definition. Var
is an abstract class with concrete subclasses ScalarVar
, ArrayVar
, and FunctionVar
.
Each Var
object contains a Val
object to hold its value. Val
is an abstract class with concrete subclasses NumVal
, StrVal
, and ArrayVal
.
A ScalarVar
holds a NumVal
or a StrVal
. An ArrayVal
holds both an ArrayDef
and an ArrayVal
. A FunctionVar
holds a FnDef
. A FunctionVar
does not have a value (no Val
subclass`).
An ArrayDef
holds a Java array. Yes, a real Java array, although for now it is still only one dimensional. UNDIM
discards the ArrayVal
, releasing all of the array storage. (Supposed to -- I haven't tested that yet.)
Var
and Val
and their subclasses, along with ArrayDef
, FnDef
, FunctionParameter
, and the Type
enum are all in the file Var.java
. It's right around 500 lines, although Run.java
is only about 300 lines shorter.
The symbol table is still in two parts. The variables are in Vars
, but I kept VarNames
, the list of name strings, because binarySearch
uses a fast native (C++) search function for strings. (Maybe some day we could combine VarNames
and Vars
into a single HashMap<String, Var>
.)
This touches all of the GetVar()
derivatives and variants, but they really have not changed that much. The set of methods is the same, which makes the change a lot more manageable. Instead of passing around a String
and setting a lot of globals, the methods pass around a Var
that knows what kind of variable it is. That makes the whole system a lot more stable.
For performance, ParseVar()
does not create new Var
objects. It reuses a pool of three, one of each of the Var
subclasses. This should mean a lot less garbage collection in idle loops. We have to be careful that the Var
returned by ParseVar()
doesn't get used as a real variable. SearchVar()
uses it only for the variable name. If it finds an existing variable, it returns the existing Var
from Vars
, not the Var
from ParseVar()
. Anything else that calls ParseVar()
either clones a new Var
or uses the information from the Var
without passing the Var
itself.
The old getVarValue()
worked by setting the global theVarIndex
. All of the 100+ places that call getVar()
or getNVar()
or one of the other variations used theVarIndex
(and VarIsNumeric
and the other globals) to find the actual Val
to read or write. Even GetArrayValue()
set theVarIndex
to an index of a value, because an array element was just another scalar in the list of scalar values. All of those methods return a boolean
to say if they succeeded; if they returned true
the real "return value" was in theVarIndex
(and VarNumber
and VarIsNumeric
and so on).
To reduce the size of the change, I kept that behavior. The methods still return a boolean
. But now all those globals are replaced with a single Val
reference, the global mVal
. (Some day, that will go away, too, but that will be a pretty big change all by itself.)
Getting a Val
reference was easy for scalar variables. It was much (much!) harder for arrays. In Paul's design, an array's data was a section of the scalar values list. (Or lists: in Paul's original design there was a list of String objects and a list of Double objects. Some time ago I combined them into one list of objects of a class that could be either.) An ArrayTable
element described an array by keeping the index of the base
, where the array started, and a length
.
Now there is no ArrayTable
, just individual ArrayVar
instances. Each ArrayVar
has an ArrayDef
which holds the data in a real Java array. All of Paul's expression parsers rely on treating an array element exactly the same way as a scalar. In the new design, they have to have a Val
.
So I created a new Val
subclass, ArrayVal
. Each ArrayVar
has both an ArrayDef
and an ArrayVal
. The ArrayDef
keeps the dimensions and all of the data. The ArrayVal
is a temporary write-through cache for one element of the array.
If you have a = b[x,y]
, executeLet()
calls the numeric expression parser, which calls getNVal()
to get the ArrayVal
that caches an element of the array b[]
. The ArrayDef
that holds the array data uses the index list to select an element of the array, then writes both the index and the array element value into the ArrayVal
cache. The expression parser finds that ArrayVal
in mVal
reads the value -- it is not reading the array directly, it is reading the cached value.
If you have b[x,y] = a
, executeLET()
calls getVar()
directly to get the ArrayVal
that caches the element of the array b[]
. The process and the result are exactly the same: mVal
is set to the ArrayVal
that caches an element of the array in the ArrayDef
attached to the ArrayVar
of the array b[]
. The cache holds the current value of the array element b[x,y]
-- but nobody cares, because this code wants to write the array, not read it. executeLET()
calls the numeric expression parser which calls getNVal()
to get the value of the scalar a
. It writes the value of a
to the ArrayVal
cache. The ArrayVal
is a write-through cache. It has the index and a reference to the ArrayDef
, so it writes the correct element of the original array to the new value.
There is no cache invalidation mechanism. Because any BASIC! array read or write is done with a full set of indices every time, the cache is used only once. It can never be stale. That seems a waste, but it does simplify the design.
The third kind of Var
is a FunctionVar
. It has no value, just the FnDef
. The problem here is that functions are global. You can't just put them in the Vars
list because the variable search starts at VarSearchStart
-- functions could not call functions. The FunctionVar
still goes in Vars
, but its FnDef
also goes into a global FunctionTable
, just as in old code. However, the FunctionTable
is now a HashMap
. You get the name -- a token ending in '(' -- from the command line, then use that to directly look up the FnDef
. GW_lib
is hundreds of functions, so you should see some improvement in speed just because of the faster function look-up. (Look at isUserFunction()
).
The change in variable management affects how doUserFunction()
handles function parameters, especially the pass-by-reference kind. Again, the structure of the code has not changed, only the details. The FunctionParameter
list parms
is built the same way as before, but now it holds the entire Var
for each parameter.
There are some corollary changes. The Dialogs can't use a Bundle
to pass parameters because I didn't make the Var
class serializable. I added a new class DialogArgs
for that purpose. the code for GR.Set.Pixels' has to have the whole array now, not just pointers into the
Valslist. And I don't have all of the
Debugcommands caught up with the new system. (I haven't debugged
Debug`.)
There were a few unnecessary changes, too. I moved the array index calculation from GetArrayValue()
to the ArrayDef
. With the new FunctionTable
(now mFunctionTable
) I could simplify ESE()
-- it doesn't reparse the same characters as many times, so it's shorter and should be faster. I'd like to do the same kind of thing for ENE()
.
This is a huge change, but it could have been bigger. I held back almost everything I could. On top of that, we're going to find this new structure will let us do things in the future that we couldn't do be before. For example, this code does not fix the problem of UNDIM
in an ISR, but now it will be fairly simple to create separate variable lists for different contexts.
But first we have to get this code tested and fully functional again.
Thanks for that! :) pretty dense but brilliant! I'll wait for the full Var.java
and the modified Run.java
to read and understand better how this works.
I suppose you also had global variables in mind when you re-architected all of that? (but that's a different topic, don't answer that now)
I'm impatient to test it against the GW lib and my RPG to see noticeable speed differences :)
I had no trouble backing out the binary search for v01.89, but I forgot again to put the Issue number in my commit comment.
The Save and Run bug was pretty simple, too, but I had lots of trouble getting the Editor Load/Save/Delete paths working better. I hope users find the new gimmick useful. But that's off-topic.
I pushed both commits a few minutes ago.
Nicolas, as you suggested, I will release v01.89 and then immediately put the new variable code in v01.89.01. If you would prefer not to wait that long, I can put the new code on GitHub in a branch, or send you the (two) files directly. There is no overlap with the Load/Save/Delete changes.
It has now been over two weeks since I was in the forum. I am terrified of what I will find there!
I can wait no worries.
Yes people have started asking where you are ;) Good luck catching up!
sVarNames = VarNames.size(); VarSearchStart = sVarNames; for (FunctionParameter parm : parms) { int index = newVarIndex(parm.name(), VarNames, sVarNames);
This fails in any program that uses a function parameter name that matches the name of a variable that already exists in the caller's name space. It's amazing that so many tests worked!
But newVarIndex doesn't use VarSearchStart ? It gets sublistStart from sVarNames. Where does VarSearchStart matter here?
Yiii! You're right! So my question is: why did changing VarSearchStart
fix anything? Stay tuned...
D'oh! It's obvious. In the new code, doUserFunction()
doesn't call newVarIndex()
. It looks like this:
sVarNames = VarNames.size();
VarSearchStart = sVarNames;
for (Var.FunctionParameter parm : parms) {
createNewVar(parm.var());
}
And createNewVar()
does use VarSearchStart
.
So it was not an error in the original binarySearch()
-based code. It was an error in the changes I made based on the new Var
architecture.
That explains why it fixed my new code but it did not fix whatever is still wrong with the original var search speed-up code.
I went through the Bug Report forum today and found a post from @evolbug that gave me a simple test case for some problem in the original speed-up code. After v01.89 is released, I plan to take a look at that case.
I hope, I'm on the right track. I have made a small drawing to make Marc's briefing visible.
The new RFO-Basic VAR-Concept.pdf Edit: Cloud link deleted, file uploaded directly
That is a good idea to make a drawing of the classes. After I get v01.89 out, I'll try to make one, too.
There are two parallel lists: VarNames
and Vars
. VarNames
is redundant; it is there only for fast binary searches. Anything that has a name has a type. Every Var
object in the Vars
list knows its own name and its own type.
I have not made Var
subclasses for bundles, lists, and stacks, because they do not have names.
In your drawing, I think your box Var types
should be in blue, too. Is that correct?
You want a list of types to parallel the list of var names and the list of vars. Each TypeVar
holds type information about another variable.
But the other variable already knows its own type. There is no need for a list of types.
In fact, I have just gone to a lot of trouble to eliminate all those other lists that were needed because a variable did not know its own type. So I do not understand why you want to put it back.
In your previous post, you used a structure like:
s | Double | Free | 10.3
That is what I just built. There is a ScalarVar
whose name is "s". It holds a NumVal
, which holds a double
(not a Double
) with the value 10.3. The Free
field is irrelevant because all variable types are fixed by their names - the name is "s", not "s$" or "s[" or "s$[" or "s(" or "s$(", so it is a numeric scalar variable.
So I do not understand what you are suggesting. Can you clarify?
To clarify, BigInteger, BigDecimal,Byte, Short, Integer, Long, Float and Double are in my case class names :wink:
Marc, you wrote:
Var and Val and their subclasses, along with ArrayDef, FnDef, FunctionParameter, and the
Type enum
are all in the file Var.java. It's right around 500 lines, although Run.java is only about 300 lines shorter.
In Run.Java _VarType enum_\ --> private enum VarType is a function. So I decoded in my mind "Type enum" is a new Array with a common index slider. I changed the name to VarType. Now I know I'm wrong. You mean with (,[,$,$(,$[ or nothing is the var type declared. These characters are also part of var name and in this way also content of Var.java.
Sooner or later we will discuss the use of BigInteger, BigDecimal and Integer, too. http://stackoverflow.com/questions/3730019/why-not-use-double-or-float-to-represent-currency Is adding the “!” for integer values like s! or s!! for BigInterger the solution? I don't think so. In future we need a Type class, because in my opinion we will get the TYPE command for more numeric and complexer data types like, TYPE s AS BigInteger. In this case the (,[,$,$(,$[ characters in var names could be obsolet.
Type class examples: Integer BigInteger BigDecimal Func FuncOverWrite 'Main function overwrites include function Del 'Var record could be deleted in idle mode. The new RFO-Basic VAR-Concept 2015-12-26.pdf
To be reckless. The table and Type class could contain commands, too. The preprocessor detects the command and the right record points directly to the branch address.
BASIC! Code:
R = cos(2*PI) + 45 +5%Example
Object Code may be like this:
LET#R#=#~0
cos#(#~1 OR executeMF_COS#(#~1
2#*#PI#~2
)#~1
+#50#~0
EDIT: Some Changes, Stackoverflow link and new drawing
Or we ignore gracefully Integer, BigInteger and BigDecimal type classes with some new functions:
BigDecimalADD$ (a$, b$, D or D$)
BigDecimalSUB$ (a$, b$, D or D$)
BigDecimalMUL$ (a$, b$, D or D$)
BigDecimalDIV$ (a$, b$, D or D$)
D → Digits after point IF D = -1 all reasonable digits will be used.
D$ → Format string
/Gregor
I have committed the new code and pushed it to GitHub. Woohoo!
marcs@Copperwall:~/basic/ws/BasicWIP4$ git commit
[master 43eacd3] Reinstate binarySearch. Near total rewrite of variable management.
3 files changed, 1538 insertions(+), 1313 deletions(-)
create mode 100644 src/com/rfo/basic/Var.java
marcs@Copperwall:~/basic/ws/BasicWIP4$
Nicolas, I know you've been waiting for me to get binarySearch()
back in, and I apologize for taking so long.
Oddly enough, the thing I am most excited about in all this is the rework of arrays. This morning I rewrote the new Array.fill
command using one of the Java Arrays.fill()
methods. Nice.
I would like to make this v01.89.01 so we can get testing on real programs by forum users. I'll post about that in the forum if I can find time. Do we need to let v01.89 settle a few days first?
Woohoo indeed :) Both versions can coexist, one official and one for testers. Release it Marc! Make it free like an eaaagle :D
And I did! I released v01.89.01 on December 31. I forgot to put the issue number in the commit (again). The bug reports are coming in. The worst is brochi's array indexing bug: http://rfobasic.freeforums.org/post26295.html#p26295
Nicolas, I hope that your graphics bugs are actually instances of brochi's array indexing bug. There was one real dumb mistake in gr.text.skew
, but that's all I've found in graphics so far.
The array indexing bug is like this:
a[i] = a[j] + x
The array a[
gets an ArrayVar
with an ArrayDef
and an ArrayVal
.
The index i
and the value of a[i]
are cached in the ArrayVal
.
The index j
and the value of a[j]
are cached, too -- in the same ArrayVal
. Oops.
Add in x
and assign the value to the ArrayVal
, which writes through to a[j]
, not a[i]
.
I looked at my code but I don't think the problem is due to an array... This is really a GR.TEXT.ALIGN
issue.
I start at the very beginning to center text, then for just a few elements i left-align, and immediately center again until the end of the program.
The second centering is not taken into account in v01.89.01...
Referring to the failing code snippet in the previous post, the fix is to keep separate Val
objects for a[i]
and a[j]
. We don't really need separate copies unless one is for read and the other is for write. Unfortunately, it's hard to distinguish the cases at the time the caching is done.
My change is to always create a new ArrayVal
every time the array is indexed. That means reading an array in a loop creates a new ArrayVal
for every item, instead of re-using the same one repeatedly. I have not measured the effect. I hope it's not too bad, because I don't see another choice.
I tried to make the cost a little less by simplifying the ArrayVal
. It doesn't cache a value any more -- in fact, it no longer has a member NumVal
or StrVal
. It caches only the index.
I also reworked ArrayDef
so it is an abstract parent of two subclasses, NumArrayDef
and StrArrayDef
, with a static factory method to create one or the other as needed. Probably not really necessary, and it does cause a two-line change in Run.BuildBasicArray()
.
Most of these Var
and Val
objects have to be either numeric or string. Sometimes I have one class that checks the type explicitly to decide what to do. Other times I have an abstract class with a subclass for each type -- there's still a type-based decision, but it's made polymorphically by the runtime. I suppose I ought to pick one and use it consistently.
Okay, Nicolas. I'll look again before cutting v01.89.02. It may be that the problem is not in the Align
command but in the way text attributes are propagated with the Current Paint. That would probably affect other text attributes, too.
If you want to test with the changes I've made already, you can get them from GitHub. I pushed fixes for the GR.Text.Skew
bug and the array assignment bug. That was the 496th commit -- coming up on 500! (I left out the issue number again but this time it was because the commit comment was already long.)
Ok I'll compile up-to-date GitHub source tomorrow morning and report. I was thinking too that this may be a Paint problem. If my GR.ALIGN
issue is not fixed I'll provide a code snippet you can use for test.
After studying my code a little more...
I'm thinking maybe the problem is that GR.BITMAP.DRAWINTO.START
newly resets the current Paint??
I'm here to report: with GitHub repo up to now I still have the graphic text alignment problem.
I wasn't able to replicate with smaller code, especially I wrote a snippet around GR.BITMAP.DRAWINTO.START
but it doesn't create (reset) a new Paint as I initially thought...
I'll continue my investigations.
I found another bug in the new Var
management. This is not related to the binarySearch()
speed-up. It's a problem in handling variable names in user-defined function parameters.
The f19_towers_of_hanoi.bas sample program fails. It's a recursive algorithm. The fn.def
looks like this:
FN.DEF hanoi(disk,dest,source,other, peg[], parms[])
the first call is
n = hanoi(disk_count, 3, 1, 2, peg[], parms[])
and it recurses like this:
n = hanoi(disk-1,other,source,dest, peg[], parms[])
The first time the hanoi
function is called, its four scalar parameters are (4,3,1,2). The first recursive call should have parameters (3,2,1,3), but in fact it has (3,2,1,2).
The fn.def
creates a list of Var
objects, one for each parameter.
doUserFunction
can use these variables to hold values until it can move VarSearchStart
. Then it adds references to those variables to the variable list, starting at the new VarSearchStart
.
For the recursive call, doUserFunction
does the same thing again: it uses the list of Var
objects from fn.def
to hold temporary values. The problem is that they're the same Var
objects. It's a little mind-bending, but you can work through it CPU-style, one step at a time:
The fn.def
list has four Var
objects. Let's say they have Android object IDs 1,2,3, and 4. Var
1 has the name string disk
, 2 is dest
, 3 is source
, and 4 is other
.
The first doUserFunction
call puts the value 4 in Var
1, 3 in Var
2, 1 in Var
3, and 2 in Var
4. Then it puts pointers to those four objects in the variable list.
The second doUserFunction
call, one parameter at a time:
Var
1: evaluate the expression disk - 1
, that is, get the caller's variable disk
(that's Var
1) and subtract 1. Assign the result (3) to the first fn.def
variable -- still Var
1. Already a problem, but we get away with this one.
Var
2: evaluate the expression other
, that is, get the caller's variable other
(that's Var
4). Assign its value (2) to the second fn.def
variable -- that's Var
2, which is also the caller's variable dest
. Yes, we just changed the value of a caller's variable.
Var
3: evaluate caller's source
(Var
3) and assign its value (1) to the third fn.def
variable (Var
3). We get away with this one, too.
Var
4: evaluate caller's dest
. That's Var
2, which should be 3, but it got changed to 2 when evaluating the second parameter. Assign the value (2) to the fourth fn.def
variable (Var
4).
It's hard to believe this doesn't destroy programs all over the place. Maybe there are not many people using v01.89.01 or v01.89.02? But v01.89.02 runs Nicolas' very large "test" program! Programming never ceases to amaze me.
This bug was probably harder to describe than it will be to fix.
On the off-chance you're looking at the changes, you may see that I changed the customized clone()
methods in Var.java to copy()
. I think it was bad form to mess with the Java definition of "clone".
in doUserFunction()
, the bugfix is to make a copy of non-global ScalarVar
parameters, create a new Val
(because the copy is shallow, it uses the same Val
as the original), put the new Val
in the copy, and puts the copy back in the FunctionParameter
object.
isNumeric()
test. And it fixes the bug.parm
for every non-global ScalarVar
parameter of every function call, whether we need to or not. I don't know how to tell if we need to. The original Var
in each such parm
is already a copy, and that copy is copied without ever being used.I'm sure there's a better way, but I'm just not seeing it, so most function calls just got a little slower. But the bug is dead.
Released in v01.90, 2016/03/31.
While discussing globals on the forum user Gikam asked for binary search on variables instead of current linear search.
Marc, you already investigated keywords lookup, but not variable lookups, and according to you back in 2012, variable lookup is the 1st most consuming task of the parser.
I investigated here and just built a BASIC! test version perfecting this trail. I wrote a benchmark reading randomly 500 out of 702 different variables: with BASIC! v01.88 it takes an average of 250ms. On my modified BASIC! (based on V01.88) it takes an average of 52ms (gain of 80%).
Changes only take place in 2 functions:
Run.searchVar(String)
andRun.createNewVar(String, int)
as indicated here