Lslforge corrupts data during optimization

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Turn on the optimizer
2. "Compile" the script below

What is the expected output? What do you see instead?

The expected output is a script that will say "Test".  The actual result is a 
script that will say some random garbage, or else won't work in SL.

What version of the product are you using? On what operating system?

I'm using lslforge 0.1.1 on MacOS

Please provide any additional information below.

The attached script "compresses" an ascii string (7-bits) by packing it into 
into a 16-bit string.  Because the compress function is pure, the compiler does 
that in the optimizer, but there is some bug that causes it to not be done 
correctly.  My goal was to use lslforge to compress my strings at compile time, 
saving me memory in the completed script.  In this explanation I'm assuming the 
user uses Mono, but that script also works in LSL, just not in lslforge.

I also suggest that this script, since it's in the Public Domain, would make a 
good unit test for the optimizer too.  If someone can give me some direction, I 
might be able to help more.  Does anyone know if this will require making a 
custom string class, or just fixing bugs in the actual code?

I would also be willing to try to make a smaller sample, if someone can give me 
a hint where the problem might be.

//------------------------------------------------------------------------------
-
//Deme of bug...  Should print "test" instead of garbage

 // Demo of ASCII compression in Mono scripts
 // By Becky Pippen, 2009, contributed to Public Domain

 // Converts n = [0..0xff] to two hex characters
 //
 string hexChar2(integer n)
 {
     string hexChars = "0123456789abcdef";
     return llGetSubString(hexChars, n >> 4, n >> 4) +
            llGetSubString(hexChars, n & 0xf, n & 0xf);
 }

 // Given a single character c, this returns its Unicode ID number
 // This works only for character codes 0 through 0xffff.
 // For a more compact alternative, see UTF8ToUnicodeInteger()
 // found in http://wiki.secondlife.com/wiki/Combined_Library .
 //
 integer charToUnicodeIdNumber(string c)
 {
     integer cInt = llBase64ToInteger(llStringToBase64(c));

     if (!(cInt & 0x80000000)) {
         // UTF-8 single-byte form
         cInt = cInt >> 24;
     } else {
         if ((cInt & 0xe0000000) == 0xc0000000) {
             // two-byte UTF-8 form:  110v vvvv  10vv vvvv
             cInt = ((cInt & 0x1f000000) >> 18) |
                    ((cInt & 0x003f0000) >> 16);
         } else {
             // assume three-byte UTF-8 form:  1110 vvvv  10vv vvvv  10vv vvvv
             cInt = ((cInt & 0x0f000000) >> 12) |
                    ((cInt & 0x003f0000) >> 10) |
                    ((cInt & 0x00003f00) >> 8);
         } // else ignore the 4-byte UTF-8 form
     }

     return cInt;
 }

 // This is a memory-savings technique for use with Mono-compiled LSL scripts.
 // (It probably works in classic LSO too, but not as efficiently.) This technique
 // stores 15 bits of information in each 16-bit Unicode character. Use the
 // encode function below to convert any 15-bit data to a Unicode character, and
 // use the decode function to convert it back to the original 15-bit data.
 //
 // This example maps the data values 0 through 0x7fff to the Unicode
 // characters U-001000 through U-008fff. Use the matching function
 // decodeCharTo15Bits() to decode the Unicode character back into the original
 // 15-bit number.
 //
 // The technique used here is very similar to the technique used in the "Base 32768
 // Script" in http://wiki.secondlife.com/wiki/Key_Compression .

 // Convert any 15-bit integer into a single Unicode character
 //
 string encode15BitsToChar(integer num)
 {
     // Check the incoming range

     if (num < 0 || num >= 0x8000) {
         // illegal input -- do whatever is appropriate
         return "�";
     }

     // Bias the incoming numeric value by 0x1000 to avoid illegal Unicode codes:

     num += 0x1000;

     // Construct an escaped hex UTF-8 representation and return
     // it as a Unicode character

     return llUnescapeURL(
                   "%" + hexChar2(0xe0 + (num >> 12)) +
                   "%" + hexChar2(0x80 + ((num >> 6) & 0x3f)) +
                   "%" + hexChar2(0x80 + (num & 0x3f)));
 }

 // This is the inverse of encode15BitsToChar(), supra, q.v.
 // This expects a single 16-bit Unicode character that was created by
 // encode15BitsToChar() and returns the 15-bit numeric value used to create it.
 // The 15-bit return value will always be in the range 0x0000 - 0x7fff.
 //
 integer decodeCharTo15Bits(string ch)
 {
     string utf8 = llEscapeURL(ch); // convert to escaped hex UTF-8

     return
         (((integer)("0x" + llGetSubString(utf8, 1, 2)) & 0x1f) << 12) +
         (((integer)("0x" + llGetSubString(utf8, 4, 5)) & 0x3f) << 6) +
          ((integer)("0x" + llGetSubString(utf8, 7, 8)) & 0x3f) - 0x1000;
 }

 // Returns a Unicode string that encodes twice as many ASCII characters.
 // Use the matching function decompressAscii() to expand it back into
 // the original ASCII.
 //
 string compressAscii(string s)
 {
     integer len = llStringLength(s);

     // Append a space if needed to make s an even number of chars
     if (len % 2) {
        s += " ";
        ++len;
     }

     string encodedChars;
     integer i;
     for (i = 0; i < len; i += 2) {
         encodedChars += encode15BitsToChar(
                 charToUnicodeIdNumber(llGetSubString(s, i, i)) << 7 |
                 charToUnicodeIdNumber(llGetSubString(s, i+1, i+1)));
     }

     return encodedChars;
 }

 // This is the inverse of compressAscii()
 //
 string uncompressAscii(string s)
 {
     string result;

     integer len = llStringLength(s);
     integer i;
     for (i = 0; i < len; ++i) {
         integer cInt15 = decodeCharTo15Bits(llGetSubString(s, i, i));
         result += llUnescapeURL("%" + hexChar2(cInt15 >> 7) +
                                 "%" + hexChar2(cInt15 & 0x7f));
     }

     return result;
 }

//---------------------------------------
//Test demo program
//  Contributed into the Public Domain.
//
//This program prints "test" when run in LSL or Mono, but lslforge's optimizer 
will correct the program.

default {
    state_entry() {
        string testStr = uncompressAscii(compressAscii("test"));
        llOwnerSay(testStr);
    }
}

Original issue reported on code.google.com by guri.li...@gmail.com on 7 May 2012 at 5:33

GoogleCodeExporter commented 9 years ago

Actually, I finally managed to boil it down into a better test case, this 
involving the simulator, with optimizations turned off:

//This line of code should store (length=1)
//In lslforge, length ends up becoming 3
integer length = llStringLength(llUnescapeURL("%e4%a9%a5"));

Original comment by guri.li...@gmail.com on 7 May 2012 at 8:47

GoogleCodeExporter commented 9 years ago

Internally the plugin forces all scripts to UTF-8 format, which is almost 
certainly the cause of this issue.  We want to ensure LSLForge behaves the same 
as the SL viewer's editor, so will attempt to fix in a near future release.

In the meantime, you might be able to get around the issue by representing the 
naughty parts with escaped code - for example: \u1234, whatever code is correct.

Original comment by elnew...@gmail.com on 11 May 2012 at 12:38

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

> In the meantime, you might be able to get around the issue by representing 
the naughty parts with escaped code - for example: \u1234, whatever code is 
correct.

Correct me if I'm wrong, but this will only help for string literals, and only 
in lslforge, but fail in SL right?

Original comment by guri.li...@gmail.com on 11 May 2012 at 1:29

GoogleCodeExporter commented 9 years ago

Same/similar issue?

The string literal "\;" normally becomes just ; when compiled and run in SL
but LSLForge changes it in the .lsl to "\\;" which becomes \; in SL, altering 
script behavior.

Easy to avoid, but took me a while to figure out what's going on

Original comment by gmau...@beyondtechsl.com on 31 Jul 2012 at 11:08

GoogleCodeExporter commented 9 years ago

I could be wrong, but this sounds like a completely different bug to me.  Mine 
is related to the character-set of the variables.  Yours sounds more like 
something wrong is the /quoting/escaping code.

Original comment by guri.li...@gmail.com on 1 Aug 2012 at 12:24

GoogleCodeExporter commented 9 years ago

  LSLForge has bugs on implementation of llEscapeURL and llUnescapeURL.  It was caused by difference of string internal format between SL and LSLForge.  SL keeps strings internally in UTF-8 format, but LSLForge keeps in UTF-32(I guess).  llEscapeURL and llUnescapeURL didn't have format conversion.
  It took a bit time because sample script was a bit long, but finally I could fix it.
  Here's the patch file for this problem, please check it out and let me know any bugs.
  See 'Building the Native Executable' of http://lslplus.sourceforge.net/installation.html to build LSLForge.
  It says to use GHC 6.10.1 or later, but you can't compile LSLForge or LSLPlus on 6.12.*. or later.  I'm using 6.10.4.
  Use it your own risk but I hope it works well for you.

  By the way, I was so surprised LSLForge had so powerful optimization.  The line 'string testStr = uncompressAscii(compressAscii("test"));' was optimized to 'string testStr = "test";'.  It's so cool.

Original comment by pells...@gmail.com on 30 Mar 2014 at 10:44

Attachments:

InternalLLFuncs.patch

GoogleCodeExporter commented 9 years ago

I'm sorry, but I don't actually have SL, or Eclipse installed anymore.  I'll 
see if I can get it running to test sometime next week.  For now, here was the 
general idea of what I was eventually planning to do.

My goal was to make a "compress" function that was carefully written to be 
pure, that the optimizer would optimize away, and then write a "Uncompress" 
that the optimizer is told not to touch.  By writing a string this way, I could 
save space in a program that holds a lot of big strings:

decompress(compress("Really long string here"))

The compress function would have mostly worked just by mashing each pair of 
UTF-8 characters into a single UTF-16 characters, rather than something longer 
and more complicated.

Original comment by guri.li...@gmail.com on 30 Mar 2014 at 2:14

elnewfie / lslforge

Lslforge corrupts data during optimization #18