DominicBoettger / hyphenator

Automatically exported from code.google.com/p/hyphenator
0 stars 0 forks source link

Performance tweaks #4

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Optimize for Performance

  # Redraw after all elements have been hyphenated
  # while instead of for
  # faster hyphenateWord()-Function!
  # …

Original issue reported on code.google.com by mathiasn...@gmail.com on 3 Jan 2008 at 2:50

GoogleCodeExporter commented 9 years ago
Revision 28/29

Faster hyphenateWord() by applying some tweaks. (Takes now around 80% of time).

Original comment by mathiasn...@gmail.com on 4 Jan 2008 at 9:08

GoogleCodeExporter commented 9 years ago
I (re)tried to implement a packed trie in javascript as proposed by Liang.

A trie could save a lot of lookups - BUT
* It blows up the code: I had to implement a whole new Trie-Class, with Nodes 
and a find-Function -> the 
Script takes longer to load.
* It blows up the data-structure: for each Node a field to higher branch, lower 
branch and equal branch is 
used. This is more than can be saved by reusing of Nodes (e.g. the first two 
nodes of "_aal" and "_abas" are 
the same, so 2 chars could be saved but it takes a lot more to implement this!) 
-> the patterns take longer to 
load.
Conclusion: it sucks! Lot of work done for nothing!

Original comment by mathiasn...@gmail.com on 7 Feb 2008 at 8:59

GoogleCodeExporter commented 9 years ago
I did a little speed comparison: /needle/.test('haystack') vs 
'haystack'.indexOf('needle')!=-1

The regex-way is up to three times slower, than the indexOf-way.
If the additional regex-functionality isn't used, use indexOf; it's uglier but 
faster.
(I'm speaking of nanoseconds here! I try everything... ;-)

Original comment by mathiasn...@gmail.com on 7 Feb 2008 at 9:07

GoogleCodeExporter commented 9 years ago
For earlier script execution and preventing redraws, see: 
http://code.google.com/p/hyphenator/issues/detail?
id=11

Original comment by mathiasn...@gmail.com on 7 Feb 2008 at 9:13

GoogleCodeExporter commented 9 years ago
Did some performance tunings in Revision 60:
saved:
FF 3b2: 27%
Safari 3: 16%
Webkit NB: 8%
Opera: 13%
IE7: 27%
IE6 20%

Original comment by mathiasn...@gmail.com on 11 Feb 2008 at 11:39

GoogleCodeExporter commented 9 years ago
I changed the format of the pattern files from JSON to string.
Pattern files are now much smaller to download. It takes about 200ms in IE on 
my Intel MacBook Pro 2.2GHz to 
convert them to an object. Other browsers are faster.

I had to change a lot.
(Revision 188)

Original comment by mathiasn...@gmail.com on 27 Oct 2008 at 4:42

GoogleCodeExporter commented 9 years ago
I fastened the hyphenateWord-function in rev 242 a little bit by removing the a 
'if'-statement:
if (!!pat) {

Original comment by mathiasn...@gmail.com on 5 Dec 2008 at 7:09

GoogleCodeExporter commented 9 years ago
With rev 275 I added a cache for hyphenated words and a switch to turn caching 
off (witch is on by default).
This enhances overall performace especially for long texts, where the 
probability of repeated words is high.

Original comment by mathiasn...@gmail.com on 3 Jan 2009 at 5:43

GoogleCodeExporter commented 9 years ago
I had an idea:
Pattern files probably could be minified:
Sorting the patterns by their length.
Patterns with the same length are concatenated (no spaces necessary) and saved 
in an object. ({'1':'patterns of 
length 1','2':'patterns of length 2',...})
Later they are splitted after the indicated length and the processed as usual.

en.js has 4446 spaces in the pattern string this are 4KB (~14%)
de.js has 12585 spaces (12.5KB <=> ~15% of the file size) 

Original comment by mathiasn...@gmail.com on 21 Feb 2009 at 5:12

GoogleCodeExporter commented 9 years ago
Forgot to mention: the new pattern format works great since r377 in the trunk

Original comment by mathiasn...@gmail.com on 9 Mar 2009 at 1:43

GoogleCodeExporter commented 9 years ago
Paul Arzul wrote (per e-mail):

--begin cite--
[…]
i find the savings of http compression to be sufficient optimization
for most use cases, while not (albeit unintentionally) obfuscating the
source. debugging compressed javascript is also difficult and
javascript decompression and evaluation plays a significant part in
delaying render:

http://batiste.dosimple.ch/blog/2007-07-02-1/

maybe encourage installation to merge javascript files if the
languages are well predefined as that should help:

$ cat Hyphenator_debug.js patterns/en.js > merge.js

--- end cite ---

Some investigations about the best practice in minifying/deflating and merging 
could be interesting, though.

How about a script/service that builds a Hyphenator.js with the patterns 
included?

Original comment by mathiasn...@gmail.com on 8 Apr 2009 at 1:56

GoogleCodeExporter commented 9 years ago
Merging code is ow on issue57

Original comment by mathiasn...@gmail.com on 4 May 2009 at 8:00

GoogleCodeExporter commented 9 years ago
Added some //todo comments in r534: improvement possible

Original comment by mathiasn...@gmail.com on 4 May 2009 at 8:01

GoogleCodeExporter commented 9 years ago
unified different RegEx'es in one: it's now simpler

Original comment by mathiasn...@gmail.com on 6 May 2009 at 6:30

GoogleCodeExporter commented 9 years ago
For now, Safari 4 and Firefox 3.5 support them.
http://ejohn.org/blog/web-workers/
http://www.whatwg.org/specs/web-workers/current-work/
I'll have to give it a try and check if there's a gain in performance...

Original comment by mathiasn...@gmail.com on 27 Jul 2009 at 7:30

GoogleCodeExporter commented 9 years ago
Web workers sound like a great idea to explore for this application.

Original comment by aphahn on 27 Jul 2009 at 9:23

GoogleCodeExporter commented 9 years ago
Actually I just began to do my first trials with WW (have a look at 
http://hyphenator.googlecode.com/svn/trunk/testsuite/test57.html)
It works (some features are still missing) but it's still slower

Notes:
1.
The script currently creates a WW fo each element that has been marked for 
hyphenation, then, each text 
node of that element is sent to it. -> Elements are hyphenated async at the 
same time
First I tried to create one WW and send words to it. But it's async, therefore 
I don't know, where the 
hyphenated words belong to...
Would be nice to have one single WW -> load the script only once (SharedWorkers 
are not yet supported by FF 
nor Wk)
2. Would be nice to have a routine to wait for all textnodes to be returned and 
then replaced at once
3. I have to find a way to get the settings into the Worker (hyphenchar etc.)

Original comment by mathiasn...@gmail.com on 4 Aug 2009 at 10:35

GoogleCodeExporter commented 9 years ago
Performance highly improved by using DOM-Storage to cache pattern-files.
(since version 3.0.0)

Original comment by mathiasn...@gmail.com on 9 Jul 2010 at 9:40

GoogleCodeExporter commented 9 years ago
See issue104 about ongoing work on Web Workers

Original comment by mathiasn...@gmail.com on 2 Aug 2010 at 9:25

GoogleCodeExporter commented 9 years ago
Performance improved by using an object to store information about elements 
instead of Expando
(test15: before: ~300ms, after ~200ms)

Original comment by mathiasn...@gmail.com on 8 Jun 2011 at 10:58

GoogleCodeExporter commented 9 years ago
Performance improved by using Bram Steins Trie-based algorithm for hyphenation: 
https://github.com/bramstein/Hypher

Original comment by mathiasn...@gmail.com on 13 Jun 2011 at 9:43

GoogleCodeExporter commented 9 years ago
Perf improved by making pattern-availability check async

Original comment by mathiasn...@gmail.com on 30 Aug 2011 at 6:24

GoogleCodeExporter commented 9 years ago
see http://code.google.com/p/hyphenator/issues/detail?id=170

Original comment by mathiasn...@gmail.com on 15 Nov 2012 at 4:12

GoogleCodeExporter commented 9 years ago
Fixed some deoptimizations of function hyphenateWord() and prepareLanguage() in 
Chrome (r1235 , r1236, r1237).

Original comment by mathiasn...@gmail.com on 7 Oct 2014 at 8:21

GoogleCodeExporter commented 9 years ago
r1319
Using setTimeout instead of setInterval leads to faster execution:
- pause time can be lowered to 10ms (instead of 100ms)
- convert-function is not interrupted by interval-calls

Original comment by mathiasn...@gmail.com on 21 Jan 2015 at 3:00