Closed GoogleCodeExporter closed 9 years ago
I tried this with 0.22-SNAPSHOT
Original comment by mrh...@gmail.com
on 11 Jul 2013 at 6:42
The issue is not an endless loop. In fact this document takes 17 min to
process because all the data in the image gets processed one character at a
time.
My original document had mutliple documents. I did leave my original all night
and it still didn't finish extracting.
Perhaps some work needs to be put into o:gfxdata tags, so that the data
portion is skipped and not processed one at a time. This will speed up
extraction.
Original comment by mrh...@gmail.com
on 11 Jul 2013 at 6:45
Correction, "My original document had mutliple documents" should read "My
original document had mutliple images (30+)".
Original comment by mrh...@gmail.com
on 11 Jul 2013 at 6:55
Original comment by yves.sav...@gmail.com
on 17 Aug 2013 at 1:17
Running in tikal (tikal.sh -fc okf_openxml -x neverending.docx) only takes me
36s, but 90+% of the time is spent in the method mrhcon identified. At least
half of that is OpenXMLContentFilter line 383:
> curtag = curtag + c;
That's a simple "use a StringBuilder instead" problem. I will work up a patch.
Original comment by tingley
on 14 Nov 2013 at 12:39
Fixed on dev, commit 11cb1ffdaf4bc2eb2fb383feb24af4c467658c16.
A roundtrip of this file (filter + merge) went from about 50 seconds to < 2 on
my machine.
Original comment by tingley
on 14 Nov 2013 at 4:37
Great! Thanks.
Original comment by yves.sav...@gmail.com
on 14 Nov 2013 at 5:01
Original issue reported on code.google.com by
mrh...@gmail.com
on 11 Jul 2013 at 6:42Attachments: