RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect [LUCENE-3867]

asfimport commented 12 years ago

RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml

A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ...

While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is:

    /**
     * Computes the approximate size of a String object. Note that if this object
     * is also referenced by another object, you should add
     * {`@link` RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
     * method.
     */
    public static int sizeOf(String str) {
        return 2 * str.length() + 6 // chars + additional safeness for arrays alignment
                + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers
                + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array
                + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object
    }

If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]).

Migrated from LUCENE-3867 by Shai Erera (@shaie), resolved Mar 23 2012 Attachments: LUCENE-3867.patch (versions: 19), LUCENE-3867-3.x.patch, LUCENE-3867-compressedOops.patch

asfimport commented 12 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

One can provide exact object allocation size (including alignments) by running with an agent (acquired from Instrumentation). This is shown here, for example:

http://www.javaspecialists.eu/archive/Issue142.html

I don't think it makes sense to be "perfect" here because there is a tradeoff between being accurate and being fast. One thing to possibly improve would be to handle reference size (4 vs. 8 bytes; in particular with compact references while running under 64 bit jvms).

asfimport commented 12 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Oh, one thing that I had in the back of my mind was to run a side-by-side comparison of Lucene's memory estimator and "exact" memory occupation via agent and see what the real difference is (on various vms and with compact vs. non-compact refs).

This would be a 2 hour effort I guess, fun, but I don't have the time for it.

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I was talking with Shai already about the OBJECT_REF size of 8, in RamUsageEstimator it is:

public final static int NUM_BYTES_OBJECT_REF = Constants.JRE_IS_64BIT ? 8 : 4;

...which does not take the CompressedOops into account. Can we detect those oops, so we can change the above ternary to return 4 on newer JVMs with compressed oops enabled?

asfimport commented 12 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

If you're running with an agent then it will tell you many bytes a reference is, so this would fix the issue. I don't think you can test this from within Java VM itself, but this is an interesting question. What you could do is spawn a child VM process with identical arguments (and an agent) and check it there, but this is quite awful...

I'll ask on hotspot mailing list, maybe they know how to do this.

asfimport commented 12 years ago

Shai Erera (@shaie) (migrated from JIRA)

I don't think it makes sense to be "perfect" here because there is a tradeoff between being accurate and being fast.

I agree. We should be fast, and "as accurate as we can get while preserving speed".

I will fix the constant's value as it's wrong. The helper methods are just that - helper. Someone can use other techniques to compute the size of objects.

Will post a patch shortly.

asfimport commented 12 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Nice catch on the overcounting of array's RAM usage!

And +1 for additional sizeOf(...) methods.

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Hi Mike,

Dawid and I were already contacting Hotspot list. There is an easy way to get the compressedOoooooops setting from inside the JVM using MXBeans from the ManagementFactory. I think we will provide a patch later! I think by that we could also optimize the check for 64 bit, because that one should also be reported by the MXBean without looking into strange sysprops (see the TODO in the code for JRE_IS_64BIT).

Uwe

asfimport commented 12 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Sysprops should be a fallback though because (to be verified) they're supported by other vendors whereas the mx bean may not be.

It needs to be verified by running under j9, jrockit, etc.

asfimport commented 12 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Consulting MXBean sounds great?

Sysprops should be a fallback though

+1

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Here the patch for detecting compressesOops in Sun JVMs. For other JVMs it will simply use false, so the object refs will be guessed to have 64 bits, which is fine as upper memory limit.

The code does only use public Java APIs and falls back if anything fails to false.

asfimport commented 12 years ago

Shai Erera (@shaie) (migrated from JIRA)

Patch adds RUE.sizeOf(String) and various sizeOf(arr[]) methods. Also fixes the ARRAY_HEADER.

Uwe, I merged with your patch, with one difference – the System.out prints in the test are printed only if VERBOSE.

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Shai: Thanks! I am in a train at the moment, so internet is slow/not working. I will later find out what MXBeans we can use to detect 64bit without looking at strange sysprops (which may have been modified by user code, so not really secure to use...).

I left the non-verbose printlns in it, so people reviewing the patch can quickly see by running that test what happens on their JVM. It would be interesting to see what your jRockit does... :-)

asfimport commented 12 years ago

Shai Erera (@shaie) (migrated from JIRA)

I tried IBM and Oracle 1.6 JVMs, and both printed the same:

    [junit] ------------- Standard Output ---------------
    [junit] NOTE: This JVM is 64bit: true
    [junit] NOTE: This JVM uses CompressedOops: false
    [junit] ------------- ---------------- ---------------

So no CompressedOops for me :).

I will later find out what MXBeans we can use to detect 64bit without looking at strange sysprops

Ok. If you'll make it, we can add these changes to that patch, otherwise we can also do them in a separate issue.

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Hm, for me (1.6.0_31, 7u3) it prints true. What JVMs are you using and what settings?

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Here my results:

*****************************************************
JAVA_HOME = C:\Program Files\Java\jdk1.7.0_03
java version "1.7.0_03"
Java(TM) SE Runtime Environment (build 1.7.0_03-b05)
Java HotSpot(TM) 64-Bit Server VM (build 22.1-b02, mixed mode)
*****************************************************

C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\core>ant test -Dtestcase=TestRam*
[junit] Testsuite: org.apache.lucene.util.TestRamUsageEstimator
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0,561 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] NOTE: This JVM is 64bit: true
[junit] NOTE: This JVM uses CompressedOops: true
[junit] ------------- ---------------- ---------------

C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\core>ant test -Dtestcase=TestRam* -Dargs=-XX:-UseCompressedOops
[junit] Testsuite: org.apache.lucene.util.TestRamUsageEstimator
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0,5 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] NOTE: This JVM is 64bit: true
[junit] NOTE: This JVM uses CompressedOops: false
[junit] ------------- ---------------- ---------------

*****************************************************
JAVA_HOME = C:\Program Files\Java\jdk1.6.0_31
java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b05)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
*****************************************************

C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\core>ant test -Dtestcase=TestRam*
[junit] Testsuite: org.apache.lucene.util.TestRamUsageEstimator
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0,453 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] NOTE: This JVM is 64bit: true
[junit] NOTE: This JVM uses CompressedOops: true
[junit] ------------- ---------------- ---------------

C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\core>ant test -Dtestcase=TestRam* -Dargs=-XX:-UseCompressedOops
[junit] Testsuite: org.apache.lucene.util.TestRamUsageEstimator
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0,421 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] NOTE: This JVM is 64bit: true
[junit] NOTE: This JVM uses CompressedOops: false
[junit] ------------- ---------------- ---------------

C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\core>ant test -Dtestcase=TestRam* -Dargs=-XX:+UseCompressedOops
[junit] Testsuite: org.apache.lucene.util.TestRamUsageEstimator
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0,422 sec
[junit]
[junit] ------------- Standard Output ---------------
[junit] NOTE: This JVM is 64bit: true
[junit] NOTE: This JVM uses CompressedOops: true
[junit] ------------- ---------------- ---------------

asfimport commented 12 years ago

Shai Erera (@shaie) (migrated from JIRA)

Oracle:

java version "1.6.0_21"
Java(TM) SE Runtime Environment (build 1.6.0_21-b07)
Java HotSpot(TM) 64-Bit Server VM (build 17.0-b17, mixed mode)

IBM:

java version "1.6.0"
Java(TM) SE Runtime Environment (build pwa6460sr9fp3-20111122_05(SR9 FP3))
IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Windows 7 amd64-64 jvmwa6460sr9-20111111_94827 (JIT enabled, AOT enabled)
J9VM - 20111111_094827
JIT  - r9_20101028_17488ifx45
GC   - 20101027_AA)
JCL  - 20110727_07

asfimport commented 12 years ago

Shai Erera (@shaie) (migrated from JIRA)

I ran "ant test-core -Dtestcase=TestRam* -Dtests.verbose=true -Dargs=-XX:+UseCompressedOops" and with the Oracle JVM I get "Compressed Oops: true" but with IBM JVM I still get 'false'.

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

OK, that is expected. 1.6.0_21 does not enable compressedOops by default, so false is correct. If you manually enable, it gets true.

jRockit is jRockit and not Sun/Oracle, so the result is somehow expected. It seems to nor have that MXBrean. But the code does not produce strange exceptions, so at least in the Sun VM we can detect compressed Oops and guess the reference size better. 8 is still not bad as it gives an upper limit.

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

By the way, here is the code from the hotspot mailing list member (my code is based on it), it also shows the outputs for different JVMs:

https://gist.github.com/1333043

(I just removed the com.sun.* imports and replaced by reflection)

asfimport commented 12 years ago

Shai Erera (@shaie) (migrated from JIRA)

8 is still not bad as it gives an upper limit.

I agree. Better to over-estimate here, than under-estimate.

Would appreciate if someone can take a look at the sizeOf() impls before I commit.

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

On Hotspot Mailing list some people also seem to have an idea about jRockit and IBM J9:

From: Krystal Mok Sent: Wednesday, March 14, 2012 3:46 PM To: Uwe Schindler Cc: Dawid Weiss; hotspot compiler Subject: Re: How to detect if the VM is running with compact refs from within the VM (no agent)?

Hi,

Just in case you'd care, the same MXBean could be used to detect compressed references on JRockit, too. It's probably available starting from JRockit R28.

Instead of "UseCompressedOops", use "CompressedRefs" as the VM option name on JRockit.

Don't know how to extract this information for J9 without another whole bunch of hackeries...well, you could try this, on a "best-effort" basis for platform detection: IBM J9's VM version string contains the compressed reference information. Example:

$ export JAVA_OPTS='-Xcompressedrefs' $ groovysh Groovy Shell (1.7.7, JVM: 1.7.0) Type 'help' or '\h' for help.

groovy:000> System.getProperty 'java.vm.info' ===> JRE 1.7.0 Linux amd64-64 Compressed References 20110810_88604 (JIT enabled, AOT enabled) J9VM - R26_Java726_GA_20110810_1208_B88592 JIT - r11_20110810_20466 GC - R26_Java726_GA_20110810_1208_B88592_CMPRSS J9CL - 20110810_88604 groovy:000> quit

So grepping for "Compressed References" in the "java.vm.info" system property gives you the clue.

Kris

asfimport commented 12 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Patch looks good!

Maybe just explain in sizeOf(String) javadoc that this method assumes the String is "standalone" (ie, does not reference a larger char[] than itself)?

Because... if you call String.substring, the returned string references a slice the char[] of the original one... and so technically the RAM it's tying up could be (much) larger than expected. (At least, this used to be the case... not sure if it's changed...).

asfimport commented 12 years ago

Shai Erera (@shaie) (migrated from JIRA)

Good point. I clarified the jdocs with this:

  /**
   * Returns the approximate size of a String object. This computation relies on
   * {`@link` String#length()} to compute the number of bytes held by the char[].
   * However, if the String object passed to this method is the result of e.g.
   * {`@link` String#substring}, the computation may be entirely inaccurate
   * (depending on the difference between length() and the actual char[]
   * length).
   */

If there are no objections, I'd like to commit this.

asfimport commented 12 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

I would opt for sizeOf to return the actual size of the object, including underlying string buffers... We can take into account interning buffers but other than that I wouldn't skew the result because it can be misleading.

asfimport commented 12 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

I don't like this special handling of Strings, to be honest. Why do we need/do it?

asfimport commented 12 years ago

Shai Erera (@shaie) (migrated from JIRA)

I don't like this special handling of Strings, to be honest. Why do we need/do it?

Because I wrote it, and it seemed useful to me, so why not? We know how Strings look like, at least in their worse case. If there will be a better implementation, we can fix it in RUE, rather than having many impls try to do it on their own?

asfimport commented 12 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I don't like this special handling of Strings, to be honest.

I'm confused: what special handling of Strings are we talking about...?

You mean that sizeOf(String) doesn't return the correct answer if the string came from a previous .substring (.split too) call...?

If so, how can we actually fix that? Is there some way to ask a string for the true length of its char[]?

asfimport commented 12 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

+  /** Returns the size in bytes of the String[] object. */
+  public static int sizeOf(String[] arr) {
+    int size = alignObjectSize(NUM_BYTES_ARRAY_HEADER + NUM_BYTES_OBJECT_REF * arr.length);
+    for (String s : arr) {
+      size += sizeOf(s);
+    }
+    return size;
+  }
+
+  /** Returns the approximate size of a String object. */
+  public static int sizeOf(String str) {
+    // String's char[] size
+    int arraySize = alignObjectSize(NUM_BYTES_ARRAY_HEADER + NUM_BYTES_CHAR * str.length());
+
+    // String's row object size    
+    int objectSize = alignObjectSize(NUM_BYTES_OBJECT_REF /* array reference */
+        + 3 * NUM_BYTES_INT /* String holds 3 integers */
+        + NUM_BYTES_OBJECT_HEADER /* String object header */);
+    
+    return objectSize + arraySize;
+  }

What I mean is that without looking at the code I would expect sizeOf(String[] N) to return the actual memory taken by an array of strings. If they point to a single char[], this should simple count the object overhead, not count every character N times as it would do now. This isn't sizeOf(), this is sum(string lengths * 2) + epsilon to me.

I'd keep RamUsageEstimator exactly what the name says – an estimation of the actual memory taken by a given object. A string can point to a char[] and if so this should be traversed as an object and counted once.

asfimport commented 12 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

If so, how can we actually fix that? Is there some way to ask a string for the true length of its char[]?

Same as with other objects – traverse its fields and count them (once, building an identity set for all objects reachable from the root)?

asfimport commented 12 years ago

Shai Erera (@shaie) (migrated from JIRA)

What I mean is that without looking at the code I would expect sizeOf(String[] N) to return the actual memory taken by an array of strings.

So you mean you'd want sizeOf(String[]) be just that?

return alignObjectSize(NUM_BYTES_ARRAY_HEADER + NUM_BYTES_OBJECT_REF * arr.length);

I don't mind. I just thought that since we know how to compute sizeOf(String), we can use that. It's an extreme case, I think, that someone will want to compute the size of String[] which share same char[] instance ... but I don't mind if it bothers you that much, to simplify it and document that it computes the raw size of the String[].

But I don't think that we should change sizeOf(String) to not count the char[] size. It's part of the object, and really it's String, not like we're trying to compute the size of a general object.

Same as with other objects – traverse its fields and count them

RUE already has .estimateRamUsage(Object) which does that through reflection. I think that sizeOf(String) can remain fast as it is now, with the comment that it my over-estimate if the String is actually a sub-string of one original larger string. In the worse case, we'll just be over-estimating.

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Hi Shai,

can ypou try this patch with J9 or maybe JRockit (Robert)? If yozu use one of those JVMs you may have to explicitely enable compressed Oops/refs!

asfimport commented 12 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

RUE already has .estimateRamUsage(Object) which does that through reflection. I think that sizeOf(String) can remain fast as it is now, with the comment that it my over-estimate if the String is actually a sub-string of one original larger string. In the worse case, we'll just be over-estimating.

Yeah, that's exactly what I didn't like. All the primitive/ primitive array methods are fine, but why make things inconsistent with sizeOf(String)? I'd rather have the reflection-based method estimate the size of a String/String[]. Like we mentioned it's always a matter of speed/accuracy but here I'd opt for accuracy because the output can be off by a lot if you make substrings along the way (not to mention it assumes details about String internal implementation which may or may not be true, depending on the vendor).

Do you have a need for this method, Shai? If you don't then why not wait (with this part) until such a need arises?

asfimport commented 12 years ago

Shai Erera (@shaie) (migrated from JIRA)

Do you have a need for this method, Shai?

I actually started this issue because of this method :). I wrote the method for my own code, then spotted the bug in the ARRAY_HEADER, and on the go thought that it will be good if RUE would offer it for me / other people can benefit from it. Because from my experience, after I put code in Lucene, very smart people improve and optimize it, and I benefit from it in new releases.

So while I could keep sizeOf(String) in my own code, I know that Uwe/Robert/Mike/You will make it more efficient when Java 7/8/9 will be out, while I'll totally forget about it ! :).

asfimport commented 12 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Yeah... well... I'm flattered :) I'm still -1 for adding this particular method because I don't like being surprised at how a method works and this is surprising behavior to me, especially in this class (even if it's documented in the javadoc, but who reads it anyway, right?).

If others don't share my opinion then can we at least rename this method to sizeOfBlah(..) where Blah is something that would indicate it's not actually taking into account char buffer sharing or sub-slicing (suggestions for Blah welcome)?

asfimport commented 12 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

estimateSizeOf(..) guessSizeOf(..) wildGuessSizeOf(..) incorrectSizeOf(..) sizeOfWeiss(..) weissSize(..) sizeOfButWithoutTakingIntoAccountCharBufferSharingOrSubSlicingSeeJavaDoc(..)

asfimport commented 12 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

If so, how can we actually fix that? Is there some way to ask a string for the true length of its char[]?

Same as with other objects – traverse its fields and count them (once, building an identity set for all objects reachable from the root)?

Aha, cool! I hadn't realized RUE can crawl into the private char[] inside string and count up the RAM usage correctly. That's nice.

Maybe lowerBoundSizeOf(...)?

Or maybe we don't add the new string methods (sizeOf(String), sizeOf(String[])) and somewhere document that you should do new RUE().size(String/String[]) instead...? Hmm or maybe we do add the methods, but implement them under-the-hood w/ that?

asfimport commented 12 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

sizeOfWeiss(..)

We're talking some serious dimensions here, beware of buffer overflows!

Or maybe we don't add the new string methods (sizeOf(String), sizeOf(String[])) and somewhere document that you should do new RUE().size(String/String[]) instead..

This is something I would go for – it's consistent with what I would consider this class's logic. I would even change it to sizeOf(Object) – this would be a static shortcut to just measure an object's size, no strings attached?

Kabutz's code also distinguishes interned strings/ cached boxed integers and enums. This could be a switch much like it is now with interned Strings. Then this would really be either an upper (why lower, Mike?) bound or something that would try to be close to the exact memory consumption.

A fun way to determine if we're right would be to run a benchmark with -Xmx20mb and test how close we can get to the main memory pool's maximum value before OOM is thrown. :)

asfimport commented 12 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

(why lower, Mike?)

Oh I just meant the sizeOf(String) impl in the current patch is a lower bound (since it "guesses" the private char[] length by calling String.length(), which is a lower bound on the actual char[] length).

asfimport commented 12 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

John Rose just replied to my question – there are fields in Unsafe that allow array scaling (1.7). Check these out:

        ARRAY_BOOLEAN_INDEX_SCALE = theUnsafe.arrayIndexScale([Z);
        ARRAY_BYTE_INDEX_SCALE = theUnsafe.arrayIndexScale([B);
        ARRAY_SHORT_INDEX_SCALE = theUnsafe.arrayIndexScale([S);
        ARRAY_CHAR_INDEX_SCALE = theUnsafe.arrayIndexScale([C);
        ARRAY_INT_INDEX_SCALE = theUnsafe.arrayIndexScale([I);
        ARRAY_LONG_INDEX_SCALE = theUnsafe.arrayIndexScale([J);
        ARRAY_FLOAT_INDEX_SCALE = theUnsafe.arrayIndexScale([F);
        ARRAY_DOUBLE_INDEX_SCALE = theUnsafe.arrayIndexScale([D);
        ARRAY_OBJECT_INDEX_SCALE = theUnsafe.arrayIndexScale([Ljava/lang/Object;);
        ADDRESS_SIZE = theUnsafe.addressSize();

So... there is a (theoretical?) possibility that, say, byte[] is machine word-aligned :) I bet any RAM estimator written so far will be screwed if this happens :)

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

So the whole Oops MBean magic is obsolete... ADDRESS_SIZE = theUnsafe.addressSize(); woooah, so simple - works on more platforms for guessing!

I will check this out with the usual reflection magic :-)

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Hi, here new patch using Unsafe to get the bitness (with the well-known fallback) and for compressedOops detection. Looks much cleaner. I also like it more, that the addressSize is now detected natively and not from sysprops.

The constants mentioned by Dawid are only availabe in Java 7, so i reflected the underlying methods from theUnsafe. I also changed the boolean JRE_USES_COMPRESSED_OOPS to an integer JRE_REFERENCE_SIZE that is used by RamUsageEstimator. We might do the same for all other native types... (this is just a start).

Shai: Can you test with your JVMs and also enable/disable compressed oops/refs?

asfimport commented 12 years ago

Shai Erera (@shaie) (migrated from JIRA)

Thanks Uwe !

I ran the test, and now with both J9 (IBM) and Oracle, I get this print (without enabling any flag):

    [junit] NOTE: running test testReferenceSize
    [junit] NOTE: This JVM is 64bit: true
    [junit] NOTE: Reference size in this JVM: 8

I modified the test name to testReferenceSize (was testCompressedOops).

I wrote this small test to print the differences between sizeOf(String) and estimateRamUsage(String):

  public void testSizeOfString() throws Exception {
    String s = "abcdefgkjdfkdsjdskljfdskfjdsf";
    String sub = s.substring(0, 4);
    System.out.println("original=" + RamUsageEstimator.sizeOf(s));
    System.out.println("sub=" + RamUsageEstimator.sizeOf(sub));
    System.out.println("checkInterned=true(orig): " + new RamUsageEstimator().estimateRamUsage(s));
    System.out.println("checkInterned=false(orig): " + new RamUsageEstimator(false).estimateRamUsage(s));
    System.out.println("checkInterned=false(sub): " + new RamUsageEstimator(false).estimateRamUsage(sub));
  }

It prints:

original=104
sub=56
checkInterned=true(orig): 0
checkInterned=false(orig): 98
checkInterned=false(sub): 98

So clearly estimateRamUsage factors in the sub-string's larger char[]. The difference in sizes of 'orig' stem from AverageGuessMemoryModel which computes the reference size to be 4 (hardcoded), and array size to be 16 (hardcoded). I modified AverageGuess to use constants from RUE (they are best guesses themselves). Still the test prints a difference, but now I think it's because sizeOf(String) aligns the size to mod 8, while estimateRamUsage isn't. I fixed that in size(Object), and now the prints are the same.

I also fixed sizeOfArray – if the array.length == 0, it returned 0, but it should return its header, and aligned to mod 8 as well.
I modified sizeOf(String[]) to sizeOf(Object[]) and compute its raw size only. I started to add sizeOf(String), fastSizeOf(String) and deepSizeOf(String[]), but reverted to avoid the hassle – the documentation confuses even me :).
Changed all sizeOf() to return long, and align() to take and return long.

I think this is ready to commit, though I'd appreciate a second look on the MemoryModel and size(Obj) changes.

Also, how about renaming MemoryModel methods to: arrayHeaderSize(), classHeaderSize(), objReferenceSize() to make them more clear and accurate? For instance, getArraySize does not return the size of an array, but its object header ...

asfimport commented 12 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

~~1 to mixing shallow and deep sizeofs -~~ sizeOf(Object[] arr) is shallow and just feels wrong to me. All the other methods yield the deep total, why make an exception? If anything, make it explicit and then do it for any type of object –

shallowSizeOf(Object t);
sizeOf(Object t);

I'm not complaining just because my sense of taste is feeling bad. I am actually using this class in my own projects and I would hate to look into the JavaDoc every time to make sure what a given method does (especially with multiple overloads). In other words, I would hate to see this:

Object [] o1 = new Object [] {1, 2, 3};
Object o2 = o1;
if (sizeOf(o1) != sizeOf(o2)) throw new WtfException();

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I ran the test, and now with both J9 (IBM) and Oracle, I get this print (without enabling any flag):
[junit] NOTE: running test testReferenceSize
[junit] NOTE: This JVM is 64bit: true
[junit] NOTE: Reference size in this JVM: 8

I hope with compressedOops explicitely enabled (or however they call them), you get a reference size of 4 in J9 and pre-1.6.0_23 Oracle?

asfimport commented 12 years ago

Shai Erera (@shaie) (migrated from JIRA)

Ok removed sizeOf(Object[]). One can compute it by using RUE.estimateRamSize to do a deep calculation.

Geez Dawid, you took away all the reasons I originally opened the issue for ;).

But at least AvgGuessMemoryModel and RUE.size() are more accurate now. And we have some useful utility methods.

asfimport commented 12 years ago

Shai Erera (@shaie) (migrated from JIRA)

I ran "ant test-core -Dtestcase=TestRam* -Dtests.verbose=true -Dargs=-XX:+UseCompressedOops" and "ant test-core -Dtestcase=TestRam* -Dtests.verbose=true -Dargs=-XX:-UseCompressedOops" and get 8 and 4 (with CompressedOops).

asfimport commented 12 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

Oh, bummer - looks like we lost the whole history of this class...such a bummer. I really wanted to take a look at how this class had evolved since I last looked at it. I've missed the conversations around the history loss - is that gone, gone, gone, or is there still some way to find it?

asfimport commented 12 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

Scratch that - I was trying to look back from the apache git clone using git - assumed it's history matched svn - but I get a clean full history using svn.

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Die, GIT, die! :-) (as usual)

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I ran "ant test-core -Dtestcase=TestRam -Dtests.verbose=true -Dargs=-XX:+UseCompressedOops" and "ant test-core -Dtestcase=TestRam -Dtests.verbose=true -Dargs=-XX:-UseCompressedOops" and get 8 and 4 (with CompressedOops).

OK, thanks. So it seems to work at least with Oracle/Sun and IBM J9. I have no other updates to this detection code.

apache / lucene

RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect [LUCENE-3867] #4940