bungeni-org / bungeni-editor

The Bungeni Editor is Drafting and Markup framework for XML production built on the OpenOffice.org platform. It supports different legislative document types (e.g hansard, bill) and supports definition of custom types, and allows markup and storage of metadata within the ODF document.
3 stars 0 forks source link

Extract track changes #60

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The current method of extracting track changes using the UNO api is slow and 
expensive. Scalabiity 
needs to be tested for a large number of documents.

To Do --

investigate alternatives -- 

1) XSLT
2) Using ODFDOM

Original issue reported on code.google.com by ashok.ha...@gmail.com on 8 Feb 2010 at 12:37

GoogleCodeExporter commented 9 years ago
ODFDOM 0.7.5 - issue with loading dom 
<http://odftoolkit.org/forums/ODFDOM/topics/91-Error-loading-document>

Original comment by ashok.ha...@gmail.com on 10 Feb 2010 at 6:41

GoogleCodeExporter commented 9 years ago
Re: comment #1 - attempting with trunk build of odfdom

Original comment by ashok.ha...@gmail.com on 10 Feb 2010 at 6:46

GoogleCodeExporter commented 9 years ago
trunk ODFDOM uses ODF 1.2 style prefix+namespace NS checking. Fixed setting of
prefix+namespace in Bungeni Editor (see Issue 60)

Original comment by ashok.ha...@gmail.com on 10 Feb 2010 at 2:42

GoogleCodeExporter commented 9 years ago
Switch to odfdom trunk from odfdom 0.7.5 .

Trunk version (rev 34) supports accessing the metadata as a dom via 
getMetaDom() 
instead of more complex external parsing.

TO DO:

Extract track changes via ODFDOM

Original comment by ashok.ha...@gmail.com on 11 Feb 2010 at 1:28

GoogleCodeExporter commented 9 years ago
How to get inserted text from  change markings ?

insert change markings use separate closures linked by an id -- 

<text:change-start id="xyz" />
<text:change-end id="xyz" />

these are placed arbitrarily in the odf hierarchy based on where the change 
occured.

Original comment by ashok.ha...@gmail.com on 17 Feb 2010 at 2:26

GoogleCodeExporter commented 9 years ago
The following appears to match inserted text parts ... at least for a couple of 
test 
case documents .. 

//text:change-start[@text:change-id='ct472232592']/following::*[@text:change-
id='ct472232592'][1]/following::text()

to do :

test with more document examples

Original comment by ashok.ha...@gmail.com on 18 Feb 2010 at 7:28

GoogleCodeExporter commented 9 years ago
After testing this the correct expression : 

//text:change-start[@text:change-id='ct-1413048760']/following::text() except  
//text:change-start[@text:change-id='ct-1413048760']/following::*[@text:change-
id='ct-1413048760'][1]/following::text()

Original comment by ashok.ha...@gmail.com on 18 Feb 2010 at 8:34

GoogleCodeExporter commented 9 years ago
Solution in comment #7 is XPath 2 syntax .. in XPath 1, the following works .. 

//text:change-start[@text:change-
id='ct716683728']/following::text()[not(preceding::text:change-end[@text:change-
id='ct716683728'])]

Original comment by ashok.ha...@gmail.com on 18 Feb 2010 at 1:32

GoogleCodeExporter commented 9 years ago

Original comment by ashok.ha...@gmail.com on 25 Feb 2010 at 9:27