Performance of Blob getBytes

GoogleCodeExporter commented 9 years ago

I posted this to the groups but after a re-read of the support FAQ I realise 
that I should have created an issue for this.

---

What we have recently noticed is some performance issues with JUnit testing 
against H2 versus Oracle so over the last few days I have spent some time 
investigating this and would like to report the problem. 
We use the same JDBC access across all the databases that our product supports 
and the recent performance issues come down to some testing using large Blobs. 
The code that we use to retrieve Blobs from the database does not use what may 
be the typical getBinaryStream but instead uses Blob.getBytes(1, 
blob.length()). 
The reason we do this is that the Blob getBytes and Clob.getSubString is the 
most performing on DB2 and Oracle and we share the code across all databases. 
H2's performance here was nearly twice as slow as Oracle's with either an 
embedded or server scenario. 

I looked into the code of H2 and in particular the JDBCBlob class getBytes 
method and found that if a byteBuffer is used to read from the Inputstream 
here, performance improves by 400%. 
The code I used was below to read the full blob back from the stream but you 
may wish to reduce this to some other BufferSize like the length() method in 
the same class. 

  public byte[] getBytes(final long pos, final int length) throws SQLException { 

    try { 
      debugCode("getBytes(" + pos + ", " + length + ");"); 
      checkClosed(); 
      final ByteArrayOutputStream out = new ByteArrayOutputStream(); 
      final InputStream in = value.getInputStream(); 
      try { 
        IOUtils.skipFully(in, pos - 1); 
        final byte[] buf = new byte[length]; 
        in.read(buf); 
        out.write(buf, 0, buf.length); 
      } finally { 
        in.close(); 
      } 
      return out.toByteArray(); 
    } catch (Exception e) { 
      throw logAndConvert(e); 
    } 
  } 

I'd believe the JDBCClob method getSubString could benefit from the same 
consideration for a buffered Reader read. 

I appreciate any consideration given to inclusion of this in the future 
release. 
Thanks, 

Ronan

Original issue reported on code.google.com by Ronan.Mu...@gmail.com on 9 Feb 2011 at 7:37

GoogleCodeExporter commented 9 years ago

Hi,

Thanks for the patch! I will implement a similar solution, however not exactly 
this one. Reason: the value of the parameter length could be larger than the 
blob itself. Therefore, I will still use a ByteArrayOutputStream, but copy the 
data in blocks of 4 KB. According to my test, this is about 3 times faster than 
the current single byte loop.

Actually my solution is already committed to the trunk.

Regards,
Thomas

Original comment by thomas.t...@gmail.com on 12 Feb 2011 at 9:39

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

Fixed in version 1.3.151.

Original comment by thomas.t...@gmail.com on 12 Feb 2011 at 2:46

Changed state: Fixed

google-code-export / h2database

Performance of Blob getBytes #285