Closed cstroe closed 9 years ago
Something strange is going on. Looking at this:
Reading 28.7 MiB ... done in 142.3 s
Reading 28.5 MiB ... done in 411.0 ms
That's in the same revision.
Another egregious example:
Reading 59.2 MiB ... done in 2472 s
This seems to be the major cause of slowdowns in reading streams from STDIN.
It looks like we are spending a ton of time in ExpandBuff:
Perhaps I'm using readChar() in a way that it's not meant to be used. It looks like it's not made to read many characters at one time, because ExpandBuff keeps incrementing the buffer 2048 each time. If reading a large file, megabytes long, the buffer will be expanded many times, and each expansion is costly.
It seems that someone else has seen this problem with JavaCC's SimpleCharStream:
http://markmail.org/message/zko7diftsjdxvoqd
Subject: Re: [JavaCC] Performance issue when consuming large token in JavaCC permalink
From: Sreenivas Viswanadha (sre...@viswanadha.net)
Date: Feb 17, 2006 8:10:55 am
List: net.java.dev.javacc.users
One otion is to increase the memory setting to the VM using -Xmx256m or
something.
If this is a delimited token - like comments in java and if you don't
need the actual image, then you can use lexical states and skip the
token text completely and simply return the token kind when you see the
end marker.
Yet another option would be to rewrite the generate SimpleCharStream
class to may be use RandomAccessFile instead of the circular buffer that
it uses.
> Hi, I hope this is the correct place to post this message.
>
> I am writing a parser to parse large files using Javacc. Some of the
> tokens can be as big as 3M. I found that once the token size becomes
> close to 1M, the parser becomes extremely slow to consume that token.
> Could anybody tell me how I should tune the parser for large tokens? Thanks!
>
> David
Seems these guys also have this problem: https://jira.blazegraph.com/browse/BLZG-478
They also mention that Lucene impemented a FastCharStream.
Changed svndump.jj
to use Lucence's FastCharStream in 656cfbf4f425390b3e87bc15a975b5b59d6a95db. The throughput difference is tremendous, orders of magnitude faster than before.
Reading large files that are part of an SvnNode takes a disproportionately long time. We currently read the file content of an SvnNode with the readByteArray() method. This is slow for some reason, and I don't know why.
Need to fix this, so that we don't take a very long time reading streams.