Accelerate variables lookup

RFO-BASIC / Basic

The Repository for the files the Basic project that creates the BASIC! APK for Google Play

62 stars 42 forks source link

Accelerate variables lookup #198

Closed mougino closed 8 years ago

mougino commented 9 years ago

While discussing globals on the forum user Gikam asked for binary search on variables instead of current linear search.

Marc, you already investigated keywords lookup, but not variable lookups, and according to you back in 2012, variable lookup is the 1st most consuming task of the parser.

I investigated here and just built a BASIC! test version perfecting this trail. I wrote a benchmark reading randomly 500 out of 702 different variables: with BASIC! v01.88 it takes an average of 250ms. On my modified BASIC! (based on V01.88) it takes an average of 52ms (gain of 80%).

Changes only take place in 2 functions: Run.searchVar(String) and Run.createNewVar(String, int) as indicated here

jMarcS commented 8 years ago

Thanks for posting your test results. (And for telling me how to do tables!) The results look pretty consistent.

There must be some misinformation floating around out there. I went to the authoritative source, developer.android.com/reference, and it looks like we're okay. This is a Good Thing; I didn't test on a Froyo emulator before committing.

There are nine flavors of BinarySearch in the Arrays class. Each of those has two variants. The full-list variants have been around since API 1. The sublist variants showed up in API 9. From the Arrays page:

public static int binarySearch (double[] array, double value)                                      Added in API level 1
public static int binarySearch (double[] array, int startIndex, int endIndex, double value)        Added in API level 9

In the Collections class, there are only two signatures, and they've both been there since the beginning. From the Collections page:

public static int binarySearch (List<? extends Comparable<? super T>> list, T object)                   Added in API level 1
public static int binarySearch (List<? extends T> list, T object, Comparator<? super T> comparator)     Added in API level 1

humpity commented 8 years ago

Yes, I got Arrays and Collections mixed up. Putting aside the b.search, another theory I have could be the effect of 'getter/setter' inline optimizations in the Dalvik engine after Froyo. http://stackoverflow.com/questions/4912695/what-optimizations-can-i-expect-from-dalvik-and-the-android-toolchain/4930538#4930538 This might make a linear search faster after Froyo, although I don't know if you can optimize VarNames.get(j). Anyway, I think I'm getting off topic now.

jMarcS commented 8 years ago

That link made interesting reading. Thank you.

There are two linear searches. No, were two linear searches -- Nicolas's change makes variable name searches binary. That's because a variable name can be extracted from the command line as a discrete token, then the token can be used in a name table lookup. I had tried a hash table, but I needed a new table for each level of function call. Nicolas found that BinarySearch can be applied to a sublist, and that has made all the difference.

The command keyword search is still linear. That's because the keyword can't be isolated as a token. Instead, the beginning of each line is compared to every entry in the command keyword table (stops when a match is found). (I know, I've said this before, but it's a necessary preface to what comes next).

That compare looks like this in pseudocode:

for each keyword in the table
  if command line starts with the keyword
    table hit

The Java method is String.startsWith(...). And it is fast. Blindingly fast. Faster than it has any right to be. So fast that everything I've tried to make it faster makes it slower!

So I quit trying. Instead, I changed the data structure that holds the commands so the keyword search happens only once per line. Originally, each line was just a Java String, and it got fully parsed every time the interpreter saw it.

Now I have a table of command functors, one for each command keyword. The first time the interpreter sees a line, it identifies the keyword and stores a reference to the functor. Each command line is an object that holds the original String, a functor reference, and the length of the keyword. When the interpreter sees the command again, it doesn't parse anything. It grabs the functor reference, skips ahead as far as the length field says it should, and starts the command.

I don't remember the speed-ups that got. Also, I did it in two stages. The first stage did non-compound commands (no dot). The second stage (much harder!) handled compound commands (commands in groups, like GPS.latitude or GR.bitmap.load. I've never tried to measure the combined effect.

If I remember right, I was a little disappointed in the effect. Expression parsing is a bigger concern, and variable name lookup is the slowest part of that.

The first stage went into v01.85. The second went into v01.88. The BinarySearch change will be in v01.89. There are hundreds of other changes between v01.85 and now, but it would still be interesting to compare benchmarks.

And now that the "search" part of variable-name handling is faster, I'd like to do some profiling to see where the next hang-up is. Probably the string extraction. If that's the case, the next speed-up will be tough -- pre-parsing the line, saving the tokens, maybe the expression tree? Putting even more "compiler" into the "interpreter".

I have to wonder if the time would be better-spent trying to find the massive memory leak. Or getting the interpreter out into a Service where it belongs.

But now I'm WAAAAAY off-topic!

jMarcS commented 8 years ago

Okay, back on-topic. Gentlebeings, we have a bug. f25_dir.bas fails. You're gonna love this (note the sarcasm, please).

UNDIM (same as ARRAY.DELETE) works by replacing the variable name with a space character. (searchVar does the same thing, to propagate the effect of UNDIM on an array passed into a function.) The intent was to write an invalid variable name so that future linear searches would skip it. This "removes" a variable without changing the index of any other variable. Then DIM (or something else) can make a new variable with the same name and a new index, and the linear search would find that.

With BinarySearch, writing over a valid variable name with a space is evil: the list is no longer in alphabetical order. Furthermore, multiple UNDIMs would create multiple variables with the same name (" "), which is not allowed with BinarySearch.

On the other hand, with BinarySearch the variable indices can change every time we add a new variable. Nobody cares, because the index values are not directly accessible. Every time the variable is looked-up, the search gets the current index.

I thought that might mean deleting variables is okay now. That could be a nice clean fix. But interrupt code can UNDIM an array in the wrong namespace, invalidating all of the VarSearchStart values saved in the call stack.

I thought we could "rename" the variable by cloning it with a new name, inserting it with BinarySearch, and deleting the old one. We'd have to create a naming scheme that would avoid duplicate invalid variable names. Besides, this is an opportunity to right an old wrong: adding unused variables to the list just slows everything down.

We could fix this by adding bookkeeping to the array descriptor. UNDIM would not change the variable, but any variable look-up that found an UNDIMed array would report the variable does not exist. An attempt to create a new array with the same name would not change the variable either, except to point it at a newly-created array descriptor. It's a little complicated, but I haven't thought of a reason it would not work. And it removes the extra step in searchVar to propagate UNDIM out of function calls. And it might be movement in the direction of separating arrays from scalars -- not sure about that one, it's hard to think about.

Any other ideas? I'd like clean and simple, but I'm moving in the opposite direction.

humpity commented 8 years ago

I can sympathise, it's one of those programming moments that makes your stomach turn inside-out.

If I read you right these are the following scenarios with UNDIMed vars (uvars);

A) Pretend nothing happened.
   Clearly does not work as replacing uvars with a 'space' screws up b.search.
   Not an option.

B) Replace uvars with a naming scheme that moves uvars to either the top
   or bottom of the list.

   (without invalid duplicates)
   Inventing a scheme e.g " myname123" can prove messy.

   An easier way, would be to just insert a space " myname"
   If the user or system tries to DIM "myname" again, then 
   " myname" can first be searched for and " myname" replaced with "myname"
   to be re-used. In a way it is similar to the BookKeeping method.

   (with invalid duplicates)
   I'm not yet convinced that Inserting or replacing the name with a space
   AND moving it to the top of the list would neccessarily screw up a
   b.search.
   This method though does not 'right an old wrong',
   you still end up with duplicates, albeit 'safe' duplicates.

   For both cases (invalid or valid) I'm not sure how it affects undim
   propagation in functions.

C) BookKeep with array-descriptor, i.e marking uvars, making uvars re-usable.
   Simplest option ?

D) Really delete variables.

   This option only works if VarNames, VarIndex and the various VarSearchStart's
   are all synchronised properly.

   This means that after deletion, UNDIM must somehow scan the call stack entries 
   and correct all the VarSearchStart's depending if a particular VarSearchStart
   was before or after the deletion (can this be done?).

mougino commented 8 years ago

I'm on smartphone, so I may have read Marc's post in diagonal, however: how about removing the arraylist element in VarNames and VarIndex (only) altogether, instead of filling with a space? Wouldn't that work?

mougino commented 8 years ago

Ah right, I read a little more in detail... How about throwing an error "Cannot undim in an interrupt" then? It's a new limitation but would solve all of our problems :white_check_mark:

jMarcS commented 8 years ago

Yes, Nicolas, that's a very tempting suggestion.

Humpty, I think your recap is mostly right on. I looked at Option C a little, and it's worse than I thought. For Option B, as you suggest, there's probably a way to deal with the duplicate variables; true duplicates would have to be excluded from the sublist being searched.

It annoys me that we have to pretzel the code to account for array names! There's got to be a better way.

But -- I've got to get the test build out so users can check out both the speed-up and the new Paint features. Maybe the right answer is to do quick-and-dirty for now and fix it right when we have more time.

Quick-and-dirty would be Option D with a mouginish twist:

in both places that the variable name is blanked, delete the variable (both VarNames and VarIndex.
Do not delete the array descriptor.
In UNDIM, if the index of the variable being deleted is smaller than interruptVarSearchStart, throw an "undim in interrupt" error. (This will have to be cleaned up a little; I was sloppy with interruptVarSearchStart.)

This is really easy.

If we do it, it will lean us toward deletion as the permanent solution. Walking the call stack and decrementing each saved VarSearchStart is easy, but it feels awfully kludgy. I want to make the code less fragile, not more!

If we could have different variable lists for each function call, the call stack would save a reference to the variable list instead of the start of the sublist. Can't do that until the variable list is a single list. But maybe we could leave the kludge in place until then?

humpity commented 8 years ago

Does UNDIM do the propagation? OR does it occur at FN.RTN/FN.END ? i.e After function2 UNDIMs, at which point does function1 know when to UNDIM it's own variable ? after the exit of function2 or before ?

jMarcS commented 8 years ago

Neither. UNDIM marks the array descriptor invalid and clears the variable name. The function returns and the interpreter goes on running. If it gets an instruction that references that array, the variable lookup finds the invalid array descriptor and clears that variable name, too.

It's in SearchVar(), there where it is says if (VarIsArray).

humpity commented 8 years ago

I see. The bookkeeping is sort of already there. Then I think this would work. The ony thing is that you can't undim a 'main' variable in an interrupt, there might be some negative feedback from users who have done this (can't be many). (btw I don't see the kludge as a 'kludge', as it ensures call stack integrity)

An alternative quick fix is to 'always' re-insert a space at index 0 (Option B-b) if index < interruptVarSearchStart, but that leaves the problem of unused entries).

jMarcS commented 8 years ago

Nicolas reported in the forum that the GW-demo doesn't work with the v01.88.03 test build.

First look: we're running into trouble with all those global variables: theVarIndex, VarNumber, etc. For example, in getArrayValues():

int avn = VarNumber; boolean avt = VarIsNumeric;
do { evalNumericExpression(); [...] } while (isNext(','));
VarNumber = avn; VarIsNumeric = avt;

That is, evalNumericExpression() is going to change VarNumber, so grab a copy. Then, after evaluating the expression, restore VarNumber.

Yeah, that was fine when a variable's index in the variable lists never changed. Now creating a new variable can move any or all of the other variables. Saving/restoring the index doesn't work any more.

For example, this crashes:

a=1 : c=3
array.load b[],11,12,13
d=b[a + b]   % d=b[a] does not crash

How many places are there like this in the code, where it assumes that a variable's index never changes? I don't know, but it looks like I'm going to have to find out.

jMarcS commented 8 years ago

That commit I made a few minutes ago says "Tiny fix in GetArrayValue()". I moved one line, so it gets the ArrayDescriptor before evaluating the index values. This way it doesn't care if VarNumber gets restored correctly, and my little code snippet doesn't crash.

Of course, VarNumber and VarIsNumeric still are not getting restored correctly. I suppose that's a problem, but I haven't got a small test case to prove it.

The fix in GetArrayValue() does not fix the problem Nicolas found testing with GW Lib. Or maybe it does fix one problem, but GW_demo still crashes. The error in the stack trace is "Index out of bounds". It's trying to reference Vars, and the index it's using is the same as the size of Vars.

I tried taking out the UNDIM change, and it made no difference. The UNDIM change removes an item from VarIndex, not from Vars.

I could, and probably will, go on trying to find the cause of that indexing error. But in the meantime, I'm starting to prepare for a more stable fix. The first step is in that same commit: "Changed Var to Val and Vars to Vals". That's the class Var and the ArrayList<Var> Vars. The terminology is very confusing: things I've been calling vars in the code are really objects that hold the value of a scalar. The change looks very large: 544 additions, 537 deletions (counting the insignificant change in GetArrayValue()). It was mostly mechanical, global search and replace. It has no effect at all on the program logic, but it makes the thinking logic easier.

With that name change, I can add a new class called Var that will hold the name and type of the variables -- yes, I intend to replace VarNames and VarIndex with a single list of Var objects. VarIsNumeric, VarIsArray, VarIsFunction, and VarIsNew will all vanish, and good riddance. And each Var object will hold a reference to a Val (for scalars), an ArrayDescriptor, or a FunctionDefinition -- just the same as VarIndex does now, except it will hold references, not indices. That may (should?) mean we can delete scalars. And UNDIMed array space.

I haven't thought through what this does to binarySearch(), yet. Have to add a Comparator, and then see if that slows down the variable search. I don't want to undo Nicolas's gains!

This is not going to look exactly like the "Grand Unified Variable" architecture I've been working toward. I just have to hope it's a little easier to implement than that would have been.

olimaticer commented 8 years ago

Some weeks ago Marc wrote about a database approach. Now I'm thinking. Marc looks more and more in this direction.

And now I know why. Now I now why the guys from CA (Realizer-Basic) chose another approach.

To fence the spaghetti code of GW-Basic they made it simpler as VBASIC.

First all variables are global except which are in function or procedure calls or are declared as local in them. (In Basic! you could call „GlobalsOn“.)

Functions or procedures have to be on top of the code except commands like the RESET command.

So the Realizer approach need only one reference table for (keywords, operators,) variables, functions and procedures, but no different name space.
OK, hit me, in this example I use name extensions.

FUNC MyFunc (s) REM Only single declaration allowed. 
    LOCAL b, c
    a = a + 5
    c = 1
    b = s + c + a
    p = „Test“
    RETURN b
END FUNC
s = 10.3
c = 6/4
TYPE a AS Integer REM Only single declaration allowed. 
a = INT(2.0)
p = 99
PRINT MyFunc (33.0) → 41.0
PRINT s, c, a   → 10.3   1.5   7
PRINT p → Test

    Reference table records step by step: (The first item is the key.)

The reference table could be :

Key ((keyword, operator,) function name or variable name)	Type (function, keyword, integer, double, string, bundle, array of strings …)

Variable type fixed? | Adress or Value ( small values, pointers, branch destinations )

This table is a case for binary search or may be faster Java search methods.

With your own binary serarch you get the point on which you can split the table, if you want to insert an new keyword. So the table is in the right order without starting a new sort process. (See my Binary Search enhancement at the rfo-basic-forum also. http://rfobasic.freeforums.org/post25410.html#p25410 )

Some remarks: Realizer uses bundles like a.Name, a.PostalCode, a.City, … . The most variables are able to change there types at runtime!

As a beginner you hate it, then you love it and later you use the LOCAL command consequently.

If the source code has reached a larger size and for every include file, Realizer generated an object code at first runtime and save it on the disk. I suggest *.bao as a file name extension. Ballast like unused functions or remarks could be thrown away. The reference table could stored in in this file, too.

I do not claim, this approach is better, but he is different and well-considered.

On the other hand, some months ago I was looking in the Java subs. So I seam to remember, all the variables are stored in Java as text. I´m right?

I hope this post gives some inspirations.

Gregor

„Make things as simple as possible, but not simpler.” (Albert Einstein)

jMarcS commented 8 years ago

I can't begin to describe what I've been doing for the last nine days. Some day I will have to do that, but not now. While testing those changes, I found a very small but very important error in the original "binary search" commit:

// Now that all new variables have been created in main name space,
// start the function name space with the function parameter names.
sVarNames = VarNames.size();
for (FunctionParameter parm : parms) {
    // Get insertion point (-index - 1) so VarNames.subList will still be in alphabetical order.
    int index = newVarIndex(parm.name(), VarNames, sVarNames);

That last line is the common function that uses binarySearch to find the insertion point of a new variable in the (sorted) variable list. The error is painfully obvious in hindsight: we're capturing the end of the caller's variable space in sVarNames, but we don't change VarSearchStart until a few lines later. Oops!

sVarNames = VarNames.size();
VarSearchStart = sVarNames;
for (FunctionParameter parm : parms) {
    int index = newVarIndex(parm.name(), VarNames, sVarNames);

This fails in any program that uses a function parameter name that matches the name of a variable that already exists in the caller's name space. It's amazing that so many tests worked!

I'm committing a change that fixes this little oversight. It also re-enables hardware graphics acceleration in HTML mode, as Nicolas requested over in Issue #196.

Unfortunately, this commit still doesn't make the original binary search speed-up code work. GW_demo still gets an IndexOutOfBoundsException just off the end of the Vals list.

jMarcS commented 8 years ago

I have a candidate for check-in. It runs GW_demo.bas. It runs all of the Sample_Programs (except I didn't run f32 or f35). That's not nearly enough.

My changes touched almost every operation in the code. They affect every command. Ideally, every command must be tested, including tests for syntax errors. Needless to say, that isn't going to happen. It's next to impossible.

If I had any sense at all, I would have tabled Nicolas's binary search code a week ago. But I didn't, so now we have to make a choice.

Remove the binary search code, release v01.89 without it. Put it in when we have time to fix it.
Commit my new code. Build v01.88.04. Deal with the errors as they turn up.

I fixed the errors that were easy to find. Now it gets harder. If we could get some forum users to run their favorite programs, we can debug it well enough to release it.

I really want to release this code. I have put too much effort into it to drop it. But we must get v01.89 out, and there's not enough time to test the new code as it should be tested.

What do you you think?

(BTW, I have not been in the forum since 12/4. I am waaaay behind. Anything you post there I will not see for a while.)

mougino commented 8 years ago

Wow, thanks Marc for your efforts! I will be interested to know what the fix was eventually. In the meantime I think the forum is a great place to have your fix tested. Many active users for many kinds of programs.

I don't know what is your timeline for v01.89, if you really want to see it out asap I would suggest getting rid of the binary search and releasing it on the Google store with the rest of the changes. Then we will have all the time to test a v01.89.01 with only the binary search.

jMarcS commented 8 years ago

Yeah, well, my timeline for v01.89 was mid-October. Didn't make it!

I think the only change I have not yet made that absolutely must go into v01.89 is the Save path bug.

Let me look at how to back out the binary search while still keeping the Paint and other recent changes.

jMarcS commented 8 years ago

I will attempt a brief explanation of the changes. Explaining them is not hard. Keeping it brief is hard!

The values list (Vals) is gone. The globals VarNumber, theValueIndex, VarIsNew, VarIsNumeric, VarIsArray, VarIsFunction, ArrayTable, ArrayValueStart, 'FnDef, WatchVarIndex are gone. Function names are global, so I had to keep FunctionTable -- one step removed from having global variables!

VarIndex was a list of indices into other value lists (Vals, ArrayTable, FunctionTable). That's gone, too, replaced by ArrayList<Var> Vars. Vars is a list of variables. Each Var object is a complete variable: a scalar, an array, or a function definition. Var is an abstract class with concrete subclasses ScalarVar, ArrayVar, and FunctionVar.

Each Var object contains a Val object to hold its value. Val is an abstract class with concrete subclasses NumVal, StrVal, and ArrayVal.

A ScalarVar holds a NumVal or a StrVal. An ArrayVal holds both an ArrayDef and an ArrayVal. A FunctionVar holds a FnDef. A FunctionVar does not have a value (no Val subclass`).

An ArrayDef holds a Java array. Yes, a real Java array, although for now it is still only one dimensional. UNDIM discards the ArrayVal, releasing all of the array storage. (Supposed to -- I haven't tested that yet.)

Var and Val and their subclasses, along with ArrayDef, FnDef, FunctionParameter, and the Type enum are all in the file Var.java. It's right around 500 lines, although Run.java is only about 300 lines shorter.

The symbol table is still in two parts. The variables are in Vars, but I kept VarNames, the list of name strings, because binarySearch uses a fast native (C++) search function for strings. (Maybe some day we could combine VarNames and Vars into a single HashMap<String, Var>.)

This touches all of the GetVar() derivatives and variants, but they really have not changed that much. The set of methods is the same, which makes the change a lot more manageable. Instead of passing around a String and setting a lot of globals, the methods pass around a Var that knows what kind of variable it is. That makes the whole system a lot more stable.

For performance, ParseVar() does not create new Var objects. It reuses a pool of three, one of each of the Var subclasses. This should mean a lot less garbage collection in idle loops. We have to be careful that the Var returned by ParseVar() doesn't get used as a real variable. SearchVar() uses it only for the variable name. If it finds an existing variable, it returns the existing Var from Vars, not the Var from ParseVar(). Anything else that calls ParseVar() either clones a new Var or uses the information from the Var without passing the Var itself.

The old getVarValue() worked by setting the global theVarIndex. All of the 100+ places that call getVar() or getNVar() or one of the other variations used theVarIndex (and VarIsNumeric and the other globals) to find the actual Val to read or write. Even GetArrayValue() set theVarIndex to an index of a value, because an array element was just another scalar in the list of scalar values. All of those methods return a boolean to say if they succeeded; if they returned true the real "return value" was in theVarIndex (and VarNumber and VarIsNumeric and so on).

To reduce the size of the change, I kept that behavior. The methods still return a boolean. But now all those globals are replaced with a single Val reference, the global mVal. (Some day, that will go away, too, but that will be a pretty big change all by itself.)

Getting a Val reference was easy for scalar variables. It was much (much!) harder for arrays. In Paul's design, an array's data was a section of the scalar values list. (Or lists: in Paul's original design there was a list of String objects and a list of Double objects. Some time ago I combined them into one list of objects of a class that could be either.) An ArrayTable element described an array by keeping the index of the base, where the array started, and a length.

Now there is no ArrayTable, just individual ArrayVar instances. Each ArrayVar has an ArrayDef which holds the data in a real Java array. All of Paul's expression parsers rely on treating an array element exactly the same way as a scalar. In the new design, they have to have a Val.

So I created a new Val subclass, ArrayVal. Each ArrayVar has both an ArrayDef and an ArrayVal. The ArrayDef keeps the dimensions and all of the data. The ArrayVal is a temporary write-through cache for one element of the array.

If you have a = b[x,y], executeLet() calls the numeric expression parser, which calls getNVal() to get the ArrayVal that caches an element of the array b[]. The ArrayDef that holds the array data uses the index list to select an element of the array, then writes both the index and the array element value into the ArrayVal cache. The expression parser finds that ArrayVal in mVal reads the value -- it is not reading the array directly, it is reading the cached value.

If you have b[x,y] = a, executeLET() calls getVar() directly to get the ArrayVal that caches the element of the array b[]. The process and the result are exactly the same: mVal is set to the ArrayVal that caches an element of the array in the ArrayDef attached to the ArrayVar of the array b[]. The cache holds the current value of the array element b[x,y] -- but nobody cares, because this code wants to write the array, not read it. executeLET() calls the numeric expression parser which calls getNVal() to get the value of the scalar a. It writes the value of a to the ArrayVal cache. The ArrayVal is a write-through cache. It has the index and a reference to the ArrayDef, so it writes the correct element of the original array to the new value.

There is no cache invalidation mechanism. Because any BASIC! array read or write is done with a full set of indices every time, the cache is used only once. It can never be stale. That seems a waste, but it does simplify the design.

The third kind of Var is a FunctionVar. It has no value, just the FnDef. The problem here is that functions are global. You can't just put them in the Vars list because the variable search starts at VarSearchStart -- functions could not call functions. The FunctionVar still goes in Vars, but its FnDef also goes into a global FunctionTable, just as in old code. However, the FunctionTable is now a HashMap. You get the name -- a token ending in '(' -- from the command line, then use that to directly look up the FnDef. GW_lib is hundreds of functions, so you should see some improvement in speed just because of the faster function look-up. (Look at isUserFunction()).

The change in variable management affects how doUserFunction() handles function parameters, especially the pass-by-reference kind. Again, the structure of the code has not changed, only the details. The FunctionParameter list parms is built the same way as before, but now it holds the entire Var for each parameter.

There are some corollary changes. The Dialogs can't use a Bundle to pass parameters because I didn't make the Var class serializable. I added a new class DialogArgs for that purpose. the code for GR.Set.Pixels' has to have the whole array now, not just pointers into theValslist. And I don't have all of theDebugcommands caught up with the new system. (I haven't debuggedDebug`.)

There were a few unnecessary changes, too. I moved the array index calculation from GetArrayValue() to the ArrayDef. With the new FunctionTable (now mFunctionTable) I could simplify ESE() -- it doesn't reparse the same characters as many times, so it's shorter and should be faster. I'd like to do the same kind of thing for ENE().

This is a huge change, but it could have been bigger. I held back almost everything I could. On top of that, we're going to find this new structure will let us do things in the future that we couldn't do be before. For example, this code does not fix the problem of UNDIM in an ISR, but now it will be fairly simple to create separate variable lists for different contexts.

But first we have to get this code tested and fully functional again.

mougino commented 8 years ago

Thanks for that! :) pretty dense but brilliant! I'll wait for the full Var.java and the modified Run.java to read and understand better how this works.

I suppose you also had global variables in mind when you re-architected all of that? (but that's a different topic, don't answer that now)

I'm impatient to test it against the GW lib and my RPG to see noticeable speed differences :)

jMarcS commented 8 years ago

I had no trouble backing out the binary search for v01.89, but I forgot again to put the Issue number in my commit comment.

The Save and Run bug was pretty simple, too, but I had lots of trouble getting the Editor Load/Save/Delete paths working better. I hope users find the new gimmick useful. But that's off-topic.

I pushed both commits a few minutes ago.

Nicolas, as you suggested, I will release v01.89 and then immediately put the new variable code in v01.89.01. If you would prefer not to wait that long, I can put the new code on GitHub in a branch, or send you the (two) files directly. There is no overlap with the Load/Save/Delete changes.

It has now been over two weeks since I was in the forum. I am terrified of what I will find there!

mougino commented 8 years ago

I can wait no worries.

Yes people have started asking where you are ;) Good luck catching up!

humpity commented 8 years ago

sVarNames = VarNames.size(); VarSearchStart = sVarNames; for (FunctionParameter parm : parms) { int index = newVarIndex(parm.name(), VarNames, sVarNames);

This fails in any program that uses a function parameter name that matches the name of a variable that already exists in the caller's name space. It's amazing that so many tests worked!

But newVarIndex doesn't use VarSearchStart ? It gets sublistStart from sVarNames. Where does VarSearchStart matter here?

jMarcS commented 8 years ago

Yiii! You're right! So my question is: why did changing VarSearchStart fix anything? Stay tuned...

jMarcS commented 8 years ago

D'oh! It's obvious. In the new code, doUserFunction() doesn't call newVarIndex(). It looks like this:

sVarNames = VarNames.size();
VarSearchStart = sVarNames;
for (Var.FunctionParameter parm : parms) {
    createNewVar(parm.var());
}

And createNewVar() does use VarSearchStart.

So it was not an error in the original binarySearch()-based code. It was an error in the changes I made based on the new Var architecture.

That explains why it fixed my new code but it did not fix whatever is still wrong with the original var search speed-up code.

I went through the Bug Report forum today and found a post from @evolbug that gave me a simple test case for some problem in the original speed-up code. After v01.89 is released, I plan to take a look at that case.

olimaticer commented 8 years ago

I hope, I'm on the right track. I have made a small drawing to make Marc's briefing visible.

The new RFO-Basic VAR-Concept.pdf Edit: Cloud link deleted, file uploaded directly

jMarcS commented 8 years ago

That is a good idea to make a drawing of the classes. After I get v01.89 out, I'll try to make one, too.

There are two parallel lists: VarNames and Vars. VarNames is redundant; it is there only for fast binary searches. Anything that has a name has a type. Every Var object in the Vars list knows its own name and its own type.

I have not made Var subclasses for bundles, lists, and stacks, because they do not have names.

In your drawing, I think your box Var types should be in blue, too. Is that correct?

You want a list of types to parallel the list of var names and the list of vars. Each TypeVar holds type information about another variable.

But the other variable already knows its own type. There is no need for a list of types.

In fact, I have just gone to a lot of trouble to eliminate all those other lists that were needed because a variable did not know its own type. So I do not understand why you want to put it back.

In your previous post, you used a structure like:

s | Double | Free | 10.3

That is what I just built. There is a ScalarVar whose name is "s". It holds a NumVal, which holds a double (not a Double) with the value 10.3. The Free field is irrelevant because all variable types are fixed by their names - the name is "s", not "s$" or "s[" or "s$[" or "s(" or "s$(", so it is a numeric scalar variable.

So I do not understand what you are suggesting. Can you clarify?

olimaticer commented 8 years ago

To clarify, BigInteger, BigDecimal,Byte, Short, Integer, Long, Float and Double are in my case class names :wink:

Marc, you wrote:

Var and Val and their subclasses, along with ArrayDef, FnDef, FunctionParameter, and the Type enum are all in the file Var.java. It's right around 500 lines, although Run.java is only about 300 lines shorter.

In Run.Java _VarType enum_\ --> private enum VarType is a function. So I decoded in my mind "Type enum" is a new Array with a common index slider. I changed the name to VarType. Now I know I'm wrong. You mean with (,[,$,$(,$[ or nothing is the var type declared. These characters are also part of var name and in this way also content of Var.java.

Sooner or later we will discuss the use of BigInteger, BigDecimal and Integer, too. http://stackoverflow.com/questions/3730019/why-not-use-double-or-float-to-represent-currency Is adding the “!” for integer values like s! or s!! for BigInterger the solution? I don't think so. In future we need a Type class, because in my opinion we will get the TYPE command for more numeric and complexer data types like, TYPE s AS BigInteger. In this case the (,[,$,$(,$[ characters in var names could be obsolet.

Type class examples: Integer BigInteger BigDecimal Func FuncOverWrite 'Main function overwrites include function Del 'Var record could be deleted in idle mode. The new RFO-Basic VAR-Concept 2015-12-26.pdf

To be reckless. The table and Type class could contain commands, too. The preprocessor detects the command and the right record points directly to the branch address.

BASIC! Code:
R =  cos(2*PI) + 45 +5%Example
Object Code may be like this:
LET#R#=#~0
cos#(#~1          OR            executeMF_COS#(#~1 
2#*#PI#~2
)#~1
+#50#~0

EDIT: Some Changes, Stackoverflow link and new drawing

olimaticer commented 8 years ago

Or we ignore gracefully Integer, BigInteger and BigDecimal type classes with some new functions:

BigDecimalADD$ (a$, b$, D or D$)
BigDecimalSUB$ (a$, b$, D or D$)
BigDecimalMUL$ (a$, b$, D or D$)
BigDecimalDIV$ (a$, b$, D or D$)

D → Digits after point    IF D = -1 all reasonable digits will be used.
D$ → Format string

/Gregor

jMarcS commented 8 years ago

I have committed the new code and pushed it to GitHub. Woohoo!

marcs@Copperwall:~/basic/ws/BasicWIP4$ git commit
[master 43eacd3] Reinstate binarySearch. Near total rewrite of variable management.
 3 files changed, 1538 insertions(+), 1313 deletions(-)
 create mode 100644 src/com/rfo/basic/Var.java
marcs@Copperwall:~/basic/ws/BasicWIP4$

Nicolas, I know you've been waiting for me to get binarySearch() back in, and I apologize for taking so long.

Oddly enough, the thing I am most excited about in all this is the rework of arrays. This morning I rewrote the new Array.fill command using one of the Java Arrays.fill() methods. Nice.

I would like to make this v01.89.01 so we can get testing on real programs by forum users. I'll post about that in the forum if I can find time. Do we need to let v01.89 settle a few days first?

mougino commented 8 years ago

Woohoo indeed :) Both versions can coexist, one official and one for testers. Release it Marc! Make it free like an eaaagle :D

jMarcS commented 8 years ago

And I did! I released v01.89.01 on December 31. I forgot to put the issue number in the commit (again). The bug reports are coming in. The worst is brochi's array indexing bug: http://rfobasic.freeforums.org/post26295.html#p26295

Nicolas, I hope that your graphics bugs are actually instances of brochi's array indexing bug. There was one real dumb mistake in gr.text.skew, but that's all I've found in graphics so far.

The array indexing bug is like this: a[i] = a[j] + x

The array a[ gets an ArrayVar with an ArrayDef and an ArrayVal. The index i and the value of a[i] are cached in the ArrayVal. The index j and the value of a[j] are cached, too -- in the same ArrayVal. Oops. Add in x and assign the value to the ArrayVal, which writes through to a[j], not a[i].

mougino commented 8 years ago

I looked at my code but I don't think the problem is due to an array... This is really a GR.TEXT.ALIGN issue. I start at the very beginning to center text, then for just a few elements i left-align, and immediately center again until the end of the program. The second centering is not taken into account in v01.89.01...

jMarcS commented 8 years ago

Referring to the failing code snippet in the previous post, the fix is to keep separate Val objects for a[i] and a[j]. We don't really need separate copies unless one is for read and the other is for write. Unfortunately, it's hard to distinguish the cases at the time the caching is done.

My change is to always create a new ArrayVal every time the array is indexed. That means reading an array in a loop creates a new ArrayVal for every item, instead of re-using the same one repeatedly. I have not measured the effect. I hope it's not too bad, because I don't see another choice.

I tried to make the cost a little less by simplifying the ArrayVal. It doesn't cache a value any more -- in fact, it no longer has a member NumVal or StrVal. It caches only the index.

I also reworked ArrayDef so it is an abstract parent of two subclasses, NumArrayDef and StrArrayDef, with a static factory method to create one or the other as needed. Probably not really necessary, and it does cause a two-line change in Run.BuildBasicArray().

Most of these Var and Val objects have to be either numeric or string. Sometimes I have one class that checks the type explicitly to decide what to do. Other times I have an abstract class with a subclass for each type -- there's still a type-based decision, but it's made polymorphically by the runtime. I suppose I ought to pick one and use it consistently.

jMarcS commented 8 years ago

Okay, Nicolas. I'll look again before cutting v01.89.02. It may be that the problem is not in the Align command but in the way text attributes are propagated with the Current Paint. That would probably affect other text attributes, too.

If you want to test with the changes I've made already, you can get them from GitHub. I pushed fixes for the GR.Text.Skew bug and the array assignment bug. That was the 496th commit -- coming up on 500! (I left out the issue number again but this time it was because the commit comment was already long.)

mougino commented 8 years ago

Ok I'll compile up-to-date GitHub source tomorrow morning and report. I was thinking too that this may be a Paint problem. If my GR.ALIGN issue is not fixed I'll provide a code snippet you can use for test.

mougino commented 8 years ago

After studying my code a little more... I'm thinking maybe the problem is that GR.BITMAP.DRAWINTO.START newly resets the current Paint??

mougino commented 8 years ago

I'm here to report: with GitHub repo up to now I still have the graphic text alignment problem.

I wasn't able to replicate with smaller code, especially I wrote a snippet around GR.BITMAP.DRAWINTO.START but it doesn't create (reset) a new Paint as I initially thought...

I'll continue my investigations.

jMarcS commented 8 years ago

I found another bug in the new Var management. This is not related to the binarySearch() speed-up. It's a problem in handling variable names in user-defined function parameters.

The f19_towers_of_hanoi.bas sample program fails. It's a recursive algorithm. The fn.def looks like this: FN.DEF hanoi(disk,dest,source,other, peg[], parms[]) the first call is n = hanoi(disk_count, 3, 1, 2, peg[], parms[]) and it recurses like this: n = hanoi(disk-1,other,source,dest, peg[], parms[])

The first time the hanoi function is called, its four scalar parameters are (4,3,1,2). The first recursive call should have parameters (3,2,1,3), but in fact it has (3,2,1,2).

The fn.def creates a list of Var objects, one for each parameter.

doUserFunction can use these variables to hold values until it can move VarSearchStart. Then it adds references to those variables to the variable list, starting at the new VarSearchStart.

For the recursive call, doUserFunction does the same thing again: it uses the list of Var objects from fn.def to hold temporary values. The problem is that they're the same Var objects. It's a little mind-bending, but you can work through it CPU-style, one step at a time:

The fn.def list has four Var objects. Let's say they have Android object IDs 1,2,3, and 4. Var 1 has the name string disk, 2 is dest, 3 is source, and 4 is other.

The first doUserFunction call puts the value 4 in Var 1, 3 in Var 2, 1 in Var 3, and 2 in Var 4. Then it puts pointers to those four objects in the variable list.

The second doUserFunction call, one parameter at a time: Var 1: evaluate the expression disk - 1, that is, get the caller's variable disk (that's Var 1) and subtract 1. Assign the result (3) to the first fn.def variable -- still Var 1. Already a problem, but we get away with this one. Var 2: evaluate the expression other, that is, get the caller's variable other (that's Var 4). Assign its value (2) to the second fn.def variable -- that's Var 2, which is also the caller's variable dest. Yes, we just changed the value of a caller's variable. Var 3: evaluate caller's source (Var 3) and assign its value (1) to the third fn.def variable (Var 3). We get away with this one, too. Var 4: evaluate caller's dest. That's Var 2, which should be 3, but it got changed to 2 when evaluating the second parameter. Assign the value (2) to the fourth fn.def variable (Var 4).

It's hard to believe this doesn't destroy programs all over the place. Maybe there are not many people using v01.89.01 or v01.89.02? But v01.89.02 runs Nicolas' very large "test" program! Programming never ceases to amaze me.

This bug was probably harder to describe than it will be to fix.

jMarcS commented 8 years ago

On the off-chance you're looking at the changes, you may see that I changed the customized clone() methods in Var.java to copy(). I think it was bad form to mess with the Java definition of "clone".

in doUserFunction(), the bugfix is to make a copy of non-global ScalarVar parameters, create a new Val (because the copy is shallow, it uses the same Val as the original), put the new Val in the copy, and puts the copy back in the FunctionParameter object.

The Good: removes one isNumeric() test. And it fixes the bug.
The Bad: making a copy adds code and uses memory. Call it a necessary evil.
The Ugly: we make the copy and stuff it into parm for every non-global ScalarVar parameter of every function call, whether we need to or not. I don't know how to tell if we need to. The original Var in each such parm is already a copy, and that copy is copied without ever being used.

I'm sure there's a better way, but I'm just not seeing it, so most function calls just got a little slower. But the bug is dead.

jMarcS commented 8 years ago

Released in v01.90, 2016/03/31.