RFO-BASIC / Basic

The Repository for the files the Basic project that creates the BASIC! APK for Google Play
62 stars 42 forks source link

Accelerate variables lookup #198

Closed mougino closed 8 years ago

mougino commented 8 years ago

While discussing globals on the forum user Gikam asked for binary search on variables instead of current linear search.

Marc, you already investigated keywords lookup, but not variable lookups, and according to you back in 2012, variable lookup is the 1st most consuming task of the parser.

I investigated here and just built a BASIC! test version perfecting this trail. I wrote a benchmark reading randomly 500 out of 702 different variables: with BASIC! v01.88 it takes an average of 250ms. On my modified BASIC! (based on V01.88) it takes an average of 52ms (gain of 80%).

Changes only take place in 2 functions: Run.searchVar(String) and Run.createNewVar(String, int) as indicated here

mougino commented 8 years ago

Out of curiosity I ran the sample program f22_benchmark.bas: on regular BASIC! the 10 passes average to 2.12s ; on modified BASIC! it's 0.2s

mougino commented 8 years ago

Whole modifications to Run.java:

    private boolean searchVar(String name) {        // search for a variable by name
        int j = Collections.binarySearch(
            VarNames.subList(VarSearchStart, VarNames.size()),  // VarSearchStart is usually zero but will change when executing User Function
            name);
        if (j < 0) {                                // not in list of variable names
          VarIsNew = true;
          return false;
        } else {                                      // found it
          if (VarIsArray) {
            ArrayDescriptor array = ArrayTable.get(VarIndex.get(j));
            if (!array.valid()) {           // array invalidated through a different variable
              VarNames.set(j, " ");     // clear this variable so a new one with the same name can be created
              VarIsNew = true;
              return false;
            }
          }
          VarIsNew = false;
          VarNumber = j;
          theValueIndex = VarIndex.get(j);  // get the value index from the var table
          return true;
        }
    }

And

    private int createNewVar(String name, int val) {// make a new var table entry; val is an index into one of the lists
        int index = Collections.binarySearch(
                VarNames.subList(VarSearchStart, VarNames.size()),  // VarSearchStart is usually zero but will change when executing User Function
                name);                        // index returned by binarySearch() is the place where to insert var
        index = VarSearchStart - index - 1;   // so that VarNames.subList() is sorted alphabetically
        VarNames.add(index, name);                      // create entry in list of variable names
        VarIndex.add(index, val);                           // create corresponding index list entry
        VarNumber++;
        return index;
    }

Improve BASIC! programs by a speed factor x5 up to x10.

Although I have a crash of BASIC! when running my simple SetG() / GetG() program. So I guess my modifications to CreateNewVar() present a flaw... Unfortunately I don't have any simulator/debugger to find where. If we fix it we could really improve BASIC! speed spectacularly :racehorse:

jMarcS commented 8 years ago

Marc, you already investigated keywords lookup, but not variable lookups,

I'm reasonably sure that I investigated variable lookups, too, but that exercise was lost in the crunch at the end of last year when I took a couple of months mostly off of BASIC!. It would take some time to reconstruct what I did then. Instead, let's pursue your binarySearch() change.

and according to you back in 2012, variable lookup is the 1st most consuming task of the parser.

That's right. I did not expect to see the kind of performance you are seeing, though!

You have translated searchVar() correctly, as far as I can tell. The problem is that you are only sorting the variable name list, VarNames. In the forum I described how variables are stored. It's pretty complicated. Specifically, you are not sorting VarIndex. Your variable look-ups are almost all referencing the wrong data storage. Try a simple test that checks the values of the variables.

I made changes a while ago that make this simpler: there used to be a list of string values and a list of numeric values, and the index in VarIndex pointed into one of them (or something else). Now those lists are combined into Vars, a list of all scalar values.

Vars is a list of references to Var objects. Currently those are either NumVar or StrVar, which hold scalar values. I'd like to extend that idea so they can also be arrays or functions or whatever. Once that happens, the index in VarIndex can be pulled into the Var class. Not literally, but conceptually, as part of the subclass hierarchy. I described some of the consequences in the forum. The mind boggles.

So that's the beginning of my "Grand Unified Variable" architecture. It's nothing new, I'm probably copying Perl, at least in spirit.

However, you don't need all that to make variable look-ups faster. I'm so dazzled by the potential glories of the GUV that I forgot to look again at the much more immediate problem of variable search speed. "Le meglio è l'inimico del bene."

This post is too long. I'll start another.

mougino commented 8 years ago

I do sort VarIndex at the same time as VarNames:

        index = VarSearchStart - index - 1;   // so that VarNames.subList() is sorted alphabetically
        VarNames.add(index, name);                      // create entry in list of variable names
        VarIndex.add(index, val);                           // create corresponding index list entry
        VarNumber++;
jMarcS commented 8 years ago

Normally I would wait until other infrastructure changes were complete. But the performance numbers you report are amazing. That makes it worthwhile to look at in intermediate change, just for variable storage and look-up. I've already made some relevant changes in earlier releases, so this won't be as hard as if we had tried to do it a couple of years ago.

I believe that your scheme will work if you sort VarIndex into the same order as VarNames. There are two ways to do that:

  1. Modify the sort so it sorts both arrays based on the order of one of them.
  2. Modify the data structure so there is only one list.

I'll describe both methods, but I have a very strong bias toward the second one, the data-oriented approach. That's obvious, from my previous post!

  1. The algorithmic approach:

If you were writing your own search, this would be easy. Just implement an indirect sort: it's a classic classroom exercise to sort an index list and apply it to a content list. Since you're using a library method, it's not so easy.

This is such a common problem that I would not be surprised if Java already has such a method. I have not looked. That would make the solution very simple indeed.

  1. The data-oriented approach:

Make the two lists into one list. This is not the "Grand Unified Variable" class. You just need a simple class that holds a String (the variable name) and an int (the value index). The value index would point at some other list, just as a VarIndex value does now. The code that uses the index knows by its context which list to use (Vars, ArrayTable, or FunctionTable).

Provide simple setters and getters so the rest of the code does not rely on the internal structure of the new class. Some day we'll be merging this intermediate change with the GUV!

Then go through ALL of the code and change every place that VarNames or VarIndex is used. This is not trivial, but like I said, I've made a lot of changes that were specifically intended to make this kind of change easier. To get an idea of the size of the problem, I deleted the declarations of both lists. That makes 45 compile errors. I should say "only" 45 compile errors -- not as bad as I expected.

jMarcS commented 8 years ago

Ha! I must laugh at myself. I just read what you wrote in the forum, and your new post here, and now I see I did not understand all of your earlier post here. Yes, you do keep VarIndex in sync with VarNames. That's a nice variation on the "algorithmic approach".

I don't have a quick answer. I sense a long and tedious debugger session in my future!

jMarcS commented 8 years ago

VarNumber is not the number of variables. It is the index of the most-recently-searched variable. (One objective of my GUV plan is to eliminate more of those globals!) In createNewVar(String, int):

        index = VarSearchStart - index - 1;   // so that VarNames.subList() is sorted alphabetically
        VarNames.add(index, name);                      // create entry in list of variable names
        VarIndex.add(index, val);                           // create corresponding index list entry
        VarNumber = index;       <-- not VarNumber++
        return index;

That fixes one crash, caused by an attempt to assign a numeric value to a string, or vice versa.

However, there is another crash, caused by an index out of bounds exception. In your test program, it happens in Bundle.Put inside a function. In my simpler test program, it's in an implicit LET (simple assignment) inside a function. If I don't do a function call, it does not crash.

Maybe just the first variable creation inside a function call?

jMarcS commented 8 years ago

Before your change, createNewVar(String, int) did not reference VarSearchStart. Now it does. But VarSearchStart is not set until the end of doUserFunction() -- too late to use it to create the local variables for the function arguments.

There's something tricky in there, indicated by the comment, "Now that all new variables have been created in main name space, start the function name space with the function parameter names."

This is from a bugfix. Suppose a user calls a function with an argument, like fn(a), but a does not exist. A new a is created in the caller's name space.

So we can't create any variables in the called function's name space, or set the value of VarSearchStart, until all of the arguments have been processed. The bugfix is right where that comment is: there's a loop there that creates new variables. It does not call createNewVar(). It knows there are no variables in the function space yet, so it just does a fast, simple loop to stick the new variables at the end of the name space. They are not sorted.

This is an example of how optimizing for speed can hurt maintainabilty!

The fix is to set VarSearchStart before creating the local variables for the parameter arguments, and to call createNewVar() to create them. (I have not tried this!)

jMarcS commented 8 years ago

If VarSearchStart is set early, it is possible for either a RunTimeError("Too few calling parameters") or a syntax error (no closing parenthesis) to leave VarSearchStart with the wrong value. I think this is okay because those are not recoverable errors. A user could trap them with OnError:, maybe just to report the error string. There have been enough changes to globals that any attempt at recovery would (should!!) end badly. (It is unfortunate that we don't have a way of marking some errors unrecoverable.)

mougino commented 8 years ago

Wow :fearful: it's really like we pulled a bit of a ball of wool and a lot more is coming...

jMarcS commented 8 years ago

Too true. Here's another strand.

Maybe you caught that already: CreateNewVar does not really create a new Var. It creates a new entry in VarNames and VarIndex, but it doesn't know anything about the new VarIndex value. Something else has to create the Var and makes sure that the new VarIndex item indexes it.

That's why doUserFunction() doesn't call CreateNewVar(). It's not just saving CPU cycles (for example, it does not update VarNumber). It's either creating the Var, or, for arrays and pass-by-reference scalars, copying in the index of the caller's Var.

mougino commented 8 years ago

If I read you correctly, doUserFunction() does not call createNewVar() (just checked it: yes, it seems true). So the fact that VarSearchStart is newly referenced in createNewVar() and that VarSearchStart is not updated until the end of doUserFunction() should have no impact at all...

I think I just need to replicate the "sorting" scheme (of VarNames and VarIndex) I implemented in createNewVar() into doUserFunction() and we should be good. How do you think?

mougino commented 8 years ago

Here is the fix for doUserFunction() :

            // Now that all new variables have been created in main name space,
            // start the function name space with the function parameter names.
            sVarNames = VarNames.size();
            for (FunctionParameter parm : parms) {
                int index = Collections.binarySearch(
                        VarNames.subList(sVarNames, VarNames.size()),
                        parm.name());                                   // index returned by binarySearch() is the place where to insert var
                index = sVarNames - index - 1;                          // so that VarNames.subList() is sorted alphabetically
                if (!parm.isArray() && !parm.isGlobal()) {
                    VarIndex.add(index, Vars.size());                   // new scalar
                    Vars.add(parm.var().clone());
                } else {
                    VarIndex.add(index, parm.varIndex());               // array or global scalar
                }
                VarNames.add(index, parm.name());
            }

This is not tested but I think it should fix both our problems : my call of Bundle.Put inside the function, and your implicit LET inside the function... I'm away from my dev laptop. If you have time to compile the change tell me the result.

mougino commented 8 years ago

I just did a check : CreateNewVar() and doUserFunction() are the only 2 places in the whole Run.java where there are occurences of either VarNames.add() and VarIndex.add(). With my changes (use the variant add(index, value) instead of just add(value)) the two lists should be sorted at all time thus making binarySearch() possible.

mougino commented 8 years ago

I just tried it... I still have the same crash due to an index out of bound in Bundle.Put :sob: I don't know where that comes from, I was pretty sure to have nailed it down... back to investigations

jMarcS commented 8 years ago

Hello, Nicolas. I have not looked at this since last weekend, I will have to start over. I do remember that there were changes needed in doUserFunction(), above the changes you show here. I did not work out those changes. The code for arrays calls getArrayVarForRead(), which indirectly calls the search function, and the code for isGlobal calls getVarAndType().

I suppose that for the last few days you have been thinking about more important issues. I am sorry for you and your people, and for our world. Be well, mon ami.

olimaticer commented 8 years ago

I agree with Marc.

Gregor

mougino commented 8 years ago

Biggest impact of my changes is that VarSearchStart is newly instantiated in CreateNewVar() so let's focus on where CreateNewVar() can be called.

In doUserFunction(), both the code for arrays and the code for isGlobal call getVarAndType(). getVarAndType() calls parseVar() and searchVar(). Neither parseVar() nor searchVar() call CreateNewVar() so the problem is not in these functions.

CreateNewVar() is called at 3 places: in createNewScalar(), BuildBasicArray(), and executeFN_DEF(). The 2 last places do not concern us (not called from doUserFunction()). But: createNewScalar() is called from getVarValue() (and only there), and getVarValue() is called from doUserFunction() (in its code for isGlobal) (and also from getVar(), getNVar(), getSVar(), and ESE(), but that's not interesting).

So the cause for the index out of bound must be in getVarValue()/createNewScalar(). I have the feeling it has to do with an inconsistent VarSearchStart so I'll try to find exactly why...

mougino commented 8 years ago

There is something I really don't understand... There are 2 separate parts in doUserFunction(): the first one consists in building the parms ArrayList, the second one scans the list and creates the variables in the function name space.

Only in the first part can we occasionally create a new var (if it is passed as parameter but never initialized before) but in this case we are still in the main name space! we do not need to have an updated VarSearchStart, on the contrary we need it to stay the one of the previous name space :worried:

I'm really losing my latin here... Or I think we're on a wrong trail with this VarSearchStart thing...

jMarcS commented 8 years ago

Yes, that was a bugfix.

The bug happened if VarSearchStart was moved before the first step: the previously unused variables were created in the called function space, then created again, with a different name, in the third step.

fn.def fn(p, q) : ... : fn.end
a = 1
x = fn(a, b)

Variable a is not a problem. It is created in the caller's space. A new variable p is created in the called function space which is a copy of the caller's variable a.

Variable b is created new. After the bugfix, it is created in the caller's space, just like a, but before the bugfix it was created in the called function space. The caller has no variable b and the called function has a variable b that it should not have. Then a new variable q is created in the called function space that is a copy of variable b, where ever that is.

VarSearchStart tells the interpreter which namespace it is in. The name space is always from VarSearchStart to the end of the variable lists.

[Note: if you got this in email before I edited it, there was an error. I said that b and q changed the same memory space. They do not. That only happens in pass by reference. Sorry for the error.]

jMarcS commented 8 years ago

I did not explain well at all. Let me try again.

Processing function arguments always creates new variables in the called function name space. Sometimes it also creates variables in the caller's name space -- but we (Paul and I) did not know this.

The bug: we were trying to create variables in both name spaces at the same time! This is impossible, of course.

The fix: I added the FunctionParameter class and split doUserFunction() into two parts.

First part: As each function arguments is processed, we put everything needed to create a new local variable into a FunctionParameter object and put it on a list (called parms). This processing may create variables in the caller's name space. It does NOT create variables in the called function's name space.

After all arguments are processed, we move VarSearchStart. Now we can't create any more variables in caller space, only in called function space.

Second part: We read the list of FunctionParameter objects that we just built. For each object, we create a new local variable in the called function space. For pass-by-value arguments, the new local variable gets the value of the argument. For pass-by-reference arguments, the new local variable gets a pointer to a variable in the caller's name space.

The loop in the second part is much like createNewVar(), optimized to work with FunctionParameter objects and expanded to handle pass-by-reference arguments.

Anything you do to createNewScalar you have to do that loop in doUserFunction(). We've talked about that.

What we have not talked about is that expansion to handle pass-by-reference arguments.

I have taken too long; I am out of time. I will come back to this later, unless you tell me you don't need me to do that. Here is the short version:

When a variable is passed by value, the value of the outer variable is copied to a new inner variable. When a variable is passed by reference, the symbol table reference of the outer variable is copied to the new inner variable -- you don't need to look up the value, you only need the symbol table index. In other words, for pass-by-reference you need searchVar() but you don't need getVarValue().

getVarValue can call createNewScalar. In v01.88 and before, createNewScalar did not need to know how to find a variable in the symbol table. It ALWAYS put a new variable at the end. With your change, that is no longer true.

The other thing getVarValue can call is GetArrayValue(), which has to evaluate array index expresssions! I have no time now to figure out how that is affected by your change. GetArrayValue always needs to do variable lookups, which may create new variables!

mougino commented 8 years ago

In your last 2 comments you talk about createNewScalar I think you rather mean createNewVar (the latter is called by the former). Anyway thanks for your explanations I had the same conclusions and my understanding was correct. The fixes I proposed should be compatible with the scheme and work, but there is an index problem anyway and I don't know where...

mougino commented 8 years ago

This is my test program, it does not crash but shows the wrong index problem:

fn.def test()
  let w=1
  let x=1
  let y=1
  fn.rtn x
fn.end

let y=2
print test()
print "(should be 1.0)"

It should print 1.0 but it prints 2.0, proving there's an issue in the variable index.

humpity commented 8 years ago

mougino,

Just looking at your code I see that you have done a sublist search and corrected the returned 'index' from sublist back to mainlist offset;

createNewVar(

index = VarSearchStart - index - 1;

doUserFunction(

index = sVarNames - index - 1;

But, you have not done so in searchVar..

searchVar(

else 
{                                      // found it
  j = VarSearchStart + j;   // ** don't you need this too ?? **

      if (VarIsArray)

mougino commented 8 years ago

That's it Humpty :open_mouth: you're a genius! So simple yet so hard to find, this solves all index issues. Binary search now works flawlessly :smile: Thanks!!

mougino commented 8 years ago

Conclusion: 3 modifications have to be passed in Run.java:

    private boolean searchVar(String name) {        // search for a variable by name
        int j = Collections.binarySearch(
            VarNames.subList(VarSearchStart, VarNames.size()),  // VarSearchStart is usually zero but will change when executing User Function
            name);
        if (j < 0) {                                // not in list of variable names
          VarIsNew = true;
          return false;
        } else {                                      // found it
          j += VarSearchStart;              // make the index absolute (thanks Humpty!)
          if (VarIsArray) {
            ArrayDescriptor array = ArrayTable.get(VarIndex.get(j));
            if (!array.valid()) {           // array invalidated through a different variable
              VarNames.set(j, " ");     // clear this variable so a new one with the same name can be created
              VarIsNew = true;
              return false;
            }
          }
          VarIsNew = false;
          VarNumber = j;
          theValueIndex = VarIndex.get(j);  // get the value index from the var table
          return true;
        }
    }
    private int createNewVar(String name, int val) {// make a new var table entry; val is an index into one of the lists
        int index = Collections.binarySearch(
                VarNames.subList(VarSearchStart, VarNames.size()),  // VarSearchStart is usually zero but will change when executing User Function
                name);                        // index returned by binarySearch() is the place where to insert var
        index = VarSearchStart - index - 1;   // so that VarNames.subList() is sorted alphabetically
        VarNames.add(index, name);                      // create entry in list of variable names
        VarIndex.add(index, val);                           // create corresponding index list entry
        VarNumber = index;
        return index;
    }
private boolean doUserFunction() {
// [...]
            // Now that all new variables have been created in main name space,
            // start the function name space with the function parameter names.
            sVarNames = VarNames.size();
            for (FunctionParameter parm : parms) {
                int index = Collections.binarySearch(
                        VarNames.subList(sVarNames, VarNames.size()),
                        parm.name());                                   // index returned by binarySearch() is the place where to insert var
                index = sVarNames - index - 1;                          // so that VarNames.subList() is sorted alphabetically
                if (!parm.isArray() && !parm.isGlobal()) {
                    VarIndex.add(index, Vars.size());                   // new scalar
                    Vars.add(parm.var().clone());
                } else {
                    VarIndex.add(index, parm.varIndex());               // array or global scalar
                }
                VarNames.add(index, parm.name());
            }
// [...]
mougino commented 8 years ago

And here is my BASIC! test program:

Text.Open w, fid, "../source/_VarLookupBenchmark.bas"

Text.WriteLn fid, "c0=Clock()"

For i=97 to 122
  n++ : Text.WriteLn fid, "LET ";Chr$(i);"=";Floor(65535*Rnd()+1);"  % var #";n
Next

For i=97 to 122
  Print n;"/676" % show progress
  For j=97 to 122
    n++ : Text.WriteLn fid, "LET ";Chr$(i);Chr$(j);"=";Floor(65535*Rnd()+1);"  % var #";n
  Next
Next

Text.WriteLn fid, "c=Clock()-c0"
Text.WriteLn fid, "PRINT ";Chr$(34);n;" variables registered in ";Chr$(34);";c;";Chr$(34);" ms.";Chr$(34)
Text.WriteLn fid, "PRINT ";Chr$(34);"Now accessing 500 variables randomly...";Chr$(34)

Text.WriteLn fid, "Dim timing[10]"
Text.WriteLn fid, "For i=1 To 10"
Text.WriteLn fid, "c0=Clock()"

For i=1 to 500
  If Mod(i,100)=0 Then Print i;"/500"
  Text.WriteLn fid, "dummy=";Chr$(Floor(25*Rnd()+97));Chr$(Floor(25*Rnd()+97))
Next

Text.WriteLn fid, "timing[i]=Clock()-c0"
Text.WriteLn fid, "PRINT ";Chr$(34);"Round #";Chr$(34);";i;";Chr$(34);" done in ";Chr$(34);";timing[i];";Chr$(34);" ms.";Chr$(34)

Text.WriteLn fid, "Next"
Text.WriteLn fid, "Array.Average t, timing[]"
Text.WriteLn fid, "PRINT ";Chr$(34);"Average: ";Chr$(34);";t;";Chr$(34);" ms.";Chr$(34)

Text.Close fid
Print "Done."

It will create a "_VarLookupBenchmark.bas" that you can run both with official BASIC! and with binary search BASIC! with my 3 modifications above.

The more entry-range the phone, the more spectacular the speed gain. My previous results were on a very low-cost phone, here is a table with also results on my high-end phone:

Phone Prog Time BASIC! V01.88 Time BinSearch BASIC! Speed gain
High-range phone _VarLookupBenchmark.bas 60 17 x3,5
High-range phone f22_benchmark.bas 0,79 0,79 x1
Low-cost phone _VarLookupBenchmark.bas 250 52 x4,8
Low-cost phone f22_benchmark.bas 2,12 0,2 x10,6
humpity commented 8 years ago

Just a suggestion, but I think there should be some sort sanity check in douserfunction

if ( index<0 )
  {  index = sVarNames - index - 1; }
else
  {  <error> }

I know it doesn't crash (01.88.02) if the user types in something silly like fn.def myfunc1 (a,a) because it just adds the second 'a' to the end of the list. But a b.search index might crash.

mougino commented 8 years ago

Just tested it with binary search: fn.def myf(a,a), fn.def myf(b,a,a): it doesn't crash neither. I am looking why. When we have an explanation and we are sure this part of doUserFunction() is exception-free (=crash-free), we can commit like I proposed.

mougino commented 8 years ago

My previous comment was slightly incorrect: doUserFunction() treats the call to a function, not the definition of a function (which is done by executeFN_DEF()).

Even so, with a fn.def myf(a,b,c) and a call myf(b,a,a) binsearch BASIC! doesn't crash. Still investigating why...

mougino commented 8 years ago

I put some traces in createNewVar() and doUserFunction() and here is the result for the following test program:

fn.def test(a,b,a)
  let z=5
  let x=1
  let y=17
  fn.rtn x
fn.end

let y=2
print test(b,a,a)
print "(should be 1.0)"

First, variable test( is created by createNewVar() at index 0/0 (search-start=0) Then, variable y is created by createNewVar() at index 1/1 (search-start=0) (this is the let y=2) Then, variable b is created by createNewVar() at index 0/2 (search-start=0) (this is the print test(b,a,a)) Finally, variable a is created (once) by createNewVar() at index 0/3 (search-start=0) (normal, the second call is a reference to the already existing var)

Then we enter the function: Function variable a is created by doUserFunction() at index 4/4 (search-start=4) Then, function variable b is created by doUserFunction() at index 5/5 (search-start=4) Then, another function variable a is created by doUserFunction() at index 3/6 (search-start=4) :warning: Then, function variable z is created by createNewVar() at index 7/7 (search-start=4) Then, function variable x is created by createNewVar() at index 7/8 (search-start=4) Then, function variable y is created by createNewVar() at index 8/9 (search-start=4)

So you are right Humpty, the line in bold shows there is a problem of creating a variable outside of the function name space.

@humpity @jMarcS what kind of error should we throw if user does a fn.def myf(a,b,a)? Something like Error: duplicate parameter name? Or should we handle it smartly (skip the second parameter a which already exists) and not throw any error??

humpity commented 8 years ago

yes, the 2nd 'a' causes a 'hit' which does not return a negative index. Such hits will cause the index to go the other way back up to corrupt the caller's namespace; index = sVarNames - index - 1; = 4-(0)-1 = 3 (when search-start=4) .( don't know what was at index3, possibly a function name?, anyway, it got stepped upon ! ) I think yes, "duplicate parameter name" and it should throw an error. It's a silly user error which he could have avoided.


Also, you don't need the 'else' in searchVar; e.g

private boolean searchVar(String name)         // search for a variable by name
{
    VarIsNew = true;            // assume new variable

    int j = Collections.binarySearch(
        VarNames.subList(VarSearchStart, VarNames.size()),  // VarSearchStart is usually zero but will change when executing User Function
        name);

    if (j < 0) return false;            // not in list of variable names, must be new

                                        // else found it
    j += VarSearchStart;                // make the index absolute
    if (VarIsArray) 
    {
      ArrayDescriptor array = ArrayTable.get(VarIndex.get(j));
      if (!array.valid()) 
      {           // array invalidated through a different variable
        VarNames.set(j, " ");     // clear this variable so a new one with the same name can be created
        return false;       // must be new
      }
    }
    VarIsNew = false;
    VarNumber = j;
    theValueIndex = VarIndex.get(j);  // get the value index from the var table
    return true;
}
mougino commented 8 years ago

Great! Then I propose the following trick to detect a duplicate parameter... and we should be done :wink:

mougino commented 8 years ago

@jMarcS I am ready to commit, how do you stand on these modifications? Do you want I send you the modified Run.java by mail and you review it during the week-end maybe? (and I leave you control over possible modifications and final commit?)

mougino commented 8 years ago

I opened issue #200 for the duplicate parameter problem, where I propose a fix (tested, works fine).

humpity commented 8 years ago

Isn't it far simpler to catch the error (+ve index) in douserfunction ? (also serves as a sanity check). Fn.Def is the proper place, yes, but it's such a rare error..

mougino commented 8 years ago

I tried it before :wink: but doUserFunction() treats the call to the function, so if you throw a run time error in doUserFunction() it will report an error in print myFunc(1,2,3) (where there is none)

humpity commented 8 years ago

That's true! Maybe a b.search in Fn.Def too? Would that be better than an extra hashset ?

There's also the issue that if Fn.IMPORT is implemented, then some sort of test has to be done also in douserfunction.

humpity commented 8 years ago

Maybe a b.search in Fn.Def too? Forget that, the ordering has to be preserved.

jMarcS commented 8 years ago

humpty, you caught a potentially really bad bug. Thank you.

With the old variable creation scheme, a duplicate parameter name was a subtle bug, but not very serious. I haven't tried to prove it, but I think the effect is that the duplicate parameter was just ignored. Two variables are created with the same name, but the linear search could find only the first one.

With the new scheme, you wind up with an index before VarSearchStart, potentially less than 0. The caller's space is corrupted, or the interpreter crashes. Not good.

This happened because doUserFunction() creates variables without first checking to see if they exist. I sincerely hope that this is the ONLY place in the code that creates variables without checking!

Nicolas has a nice fix for this; both cheap and easy. But I feel humpty's unease. If we're going to program defensively, then:

jMarcS commented 8 years ago

I was concerned about the pass-by-reference cases. I have convinced myself that Nicolas's code handles them correctly. I think I was confused about VarNames, VarIndex, and Vars. The new scheme inserts items into the first two, but we still add items only at the end of Vars.

I also ran some parameter-passing tests and they all, um... passed (pun not intended, but enjoyed nonetheless!).

humpity commented 8 years ago

My concern was if the error merited an extra overhead in Fn.Def. createNewVar is a great choice. It will have the new b.search so it will know right away if the name already exists without creating a hashset. So (for this issue) it just remains what kind of error msg will be given?

jMarcS commented 8 years ago

if the error merited an extra overhead in Fn.Def.

Absolutely. Fn.Def runs only once per function, so a little (more) overhead is okay.

I suppose we could make it a little better. We don't have to create a HashSet unless there are at least two parameters. But that's just the uber-geek in me. (At least two parameters of the same type! But that way lies madness...)

createNewVar is a great choice. It will have the new b.search so it will know right away if the name already exists without creating a hashset.

I think you're right. Overhead in createNewVar is more expensive: it runs once for every variable name, including array names and function names. Not nearly as bad as overhead in varSearch(), of course. It's probably small, and I think it's worth it. Still, I think we'll need a performance measurement to be sure.

Somehow we have to combine createNewVar and the changes in doUserFunction() so that the binary search call appears only once. That's where the test for a positive index value should go.

what kind of error msg will be given?

If the rest of the code is done right (and I think it is, with Nicolas's fix), you will never get this error. It could be of the "Notify developer!" type that we already have for some other errors that indicate the interpreter code is broken. If not that, then something like "Attempt to create variable " " that already exists". Opinions?

One other consideration: I'm not certain we can call RunTimeError() from createNewVar(). If not, reporting the error gets ugly, because its caller has to check.

We really really need a calling tree for the components of GetVar()! I used to use egypt to draw call graphs from GCC symbol (RTL) files. It's a Perl script that walks the symbol table looking for connections and sends the result to graphviz. I wonder if there's anything like that for Java?

Perhaps I'm over-complicating this issue -- I've always felt one of the most important things an engineer can do is worry about things that could go wrong. But I do think it will take a little longer to work through this stuff and I don't want to hold up the binary search change any more.

I think it's time to commit Nicolas's code. I have some pretty-printing suggestions and a comment change or two, but nothing algorithmic.

jMarcS commented 8 years ago

My wife is on the phone with a friend, so I had a little time to kill.

I added an index check to createNewVar(). I had to add a duplicate check in doUserFunction().

if (index >= 0) { throw new InvalidParameterException("createNewVar: variable " + name + " already exists"); }

That won't give a Runtime error to the BASIC! user. It throws a Java exception. I think this is a good way to handle internal errors.

Well, apparently I have thought that before, because there are a bunch of other places I throw InvalidParameterException for internal errors. (Also some places where I catch that Exception, maybe I should stop catching them?)

Since that has somehow become the de facto way to catch internal errors, I added a clause to the UncaughtExceptionHandler() for InvalidParameterException. Should have done that a long time ago.

To test it, I took out Nicolas's HashSet test and ran a BASIC! program with a function that has duplicate parameters. The program (not BASIC!) crashes, and I get a stack trace of the duplicate variable creation attempt. Down at the bottom, it does the standard Runtime Error thing and prints the offending line of the BASIC! program. Beauty!

What does the index check cost? I don' t know. I tried to run Nicolas's performance test on my old gingerbread phone. I can't get results consistent enough to say if there is a measurable difference in performance. There's something else going on, maybe some Android activity, that skews the results of any run by tens of milliseconds. The fastest batch I saw says that adding the index test makes the code run faster. That's impossible, so either I have to find some way to clean up the interference, or I have to declare the results a draw: adding the index check is essentially free.

jMarcS commented 8 years ago

I am not able to achieve Nicolas's performance results.

I get a nice improvement in the variable-heavy test program _VarLookupBenchmark.bas. On my Moto G phone it's 4x or 5x improvement.

But I see only a tiny bump in speed of the f22_benchmark. It's about 2% (not 2x) on an old XOOM tablet running Jellybean (2.30 -> 2.25). On my Moto G it may be about the same, but 2% of a 0.80 second-run is noise.

Any ideas?

humpity commented 8 years ago

Perhaps because f22 uses just a few variables?

Wikipedia: https://en.wikipedia.org/wiki/Binary_search_algorithm#Average_performance "In theory binary search is usually faster than linear search, but in practice that may not hold true. For small arrays (say about 64 items or less), linear search may have better performance. "

Also: https://schani.wordpress.com/2010/04/30/linear-vs-binary-search/

jMarcS commented 8 years ago

Yes, I'm sure you're right, but up in that nifty little table Nicolas posted last week (and how do you post a table to GitHub anyway??), it says 10x for a low-end phone, and no improvement for a high-end phone. I'm seeing no improvement at either end. I don't have a high-end phone, but my Moto G is fairly recent, and reports f22_benchmark numbers in the 0.8 second range).

Looking more closely, I wonder if that's an error. I believe 0.79 for the high-end, but I don't think 0.2 for a low-end phone is right. My numbers for _VarLookup match the table well enough (2-3x for the slow XOOM, 4-5x for the fast phone).

This is off-topic, but I thought I'd mention something else I see. _VarLookup does 10 "rounds" and the variation between them is huge. The tablet just goes away for a while, anywhere from 200 to 500 ms. I think it's about the same for either version of BASIC!, so it tends to weaken the performance improvement measurement.

I'm sure an analysis of what's it's doing would be interesting. I'm pretty sure BASIC! does way too much garbage collection, even when it's just sitting in an idle loop. I've noticed it before, and I don't know what's causing it. I know there are changes in Lollipop to smooth out the GC, but it's still doing work it shouldn't need to do. But that will be another project for another time.

humpity commented 8 years ago
HTC Sensation
_VarLookup 235.1 61.4
F22 2.5166 2.4579

( will try old phone tonight ) ( https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet#tables )

humpity commented 8 years ago

Summary

HTC Sensation

ICS 4.0.3 01.88.02 b.search diff
_VarLookup 235.1 61.4 3.82x
f22 2.5166 2.4579 1.02x

LG P500

Froyo 2.2.1 01.88.02 b.search diff
_VarLookup 1201.2 186.0 6.45x
f22 8.943 7.575 1.18x
humpity commented 8 years ago

Can't make head or tail of this, but Collections binarySearch shouldn't really work for Froyo. After a bit of googling, the Collections binarySearch only came in since Gingerbread (API 9 - 2.3). The Array binarySearch was always there (API 1).