FieldExtractor incorrectly identifies fields in free text

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?

1. Run the two attached docx through the following code:

        XDocReport report = XDocReportRegistry.getRegistry().loadReport(...);
        FieldsExtractor<FieldExtractor> extractor = FieldsExtractor.create();
        report.extractFields(extractor);
        List<FieldExtractor> fields = extractor.getFields();

What is the expected output? What do you see instead?

. In the Error.docx, there is a FieldExtractor for the text "FakeMergeField", 
even though it is not entered as a merge field.

  ie.

        <w:p w:rsidR = "00F174B4" w:rsidRDefault = "00F174B4">
            <w:r>
                <w:t>${FakeMergeField}</w:t>
            </w:r>
        </w:p>

. In Ok.docx, the text "FakeMergeField" is not interpreted as a field due to 
the way the text was entered.

  ie:

        <w:p w:rsidR = "00F174B4" w:rsidRDefault = "00A97032">
            <w:r>
                <w:t>$</w:t>
            </w:r>
            <w:r w:rsidR = "00F174B4">
                <w:t>{</w:t>
            </w:r>
            <w:proofErr w:type = "spellStart"/>
            <w:r w:rsidR = "00F174B4">
                <w:t>FakeMergeField</w:t>
            </w:r>
            <w:proofErr w:type = "spellEnd"/>
            <w:r w:rsidR = "00F174B4">
                <w:t>}</w:t>
            </w:r>
        </w:p>

What version of the product are you using? On what operating system?

. 0.9.8 on Windows XP.

Original issue reported on code.google.com by Mr.M.McM...@googlemail.com on 6 Nov 2012 at 3:53

Attachments:

GoogleCodeExporter commented 9 years ago

Hi,

You must use mergefield (for docx) or input field (for odt) to set 
Freemarker/Velocity directive, interpolation etc.

As you have seen, docx split your text and it's very hard to manage your case. 

This issue is the same than 
http://code.google.com/p/xdocreport/issues/detail?id=80

Regards Angelo

Original comment by angelo.z...@gmail.com on 6 Nov 2012 at 4:52

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

Hi Angelo.

I don't think I stated the problem very clearly. The problem is not that I want 
to be able to enter the merge field in plain text, it is the reverse: I don't 
want plain text to be interpreted as a merge field.

Any plain text which starts ${ will cause the FieldExtractor code to break. 

I think it should only be processing the text if it is within a merge field. 
This appears to be identified by the text MERGEFIELD.

            <w:r>
                <w:instrText xml:space = "preserve">MERGEFIELD  ${RealMergeField}  \* MERGEFORMAT</w:instrText>
            </w:r>
            <w:r>
                <w:fldChar w:fldCharType = "separate"/>
            </w:r>
            <w:r w:rsidR = "00F174B4">
                <w:rPr>
                    <w:noProof/>
                </w:rPr>
                <w:t>Â«${RealMergeField}Â»</w:t>
            </w:r>

Original comment by Mr.M.McM...@googlemail.com on 7 Nov 2012 at 1:00

GoogleCodeExporter commented 9 years ago

Hi,

Ok I understand your problem. You wish to use $ token in your docx. I agree 
with you, XDocReport should escape "by default" template engine interpolation, 
directive if it's not included in a mergefield. But it's possible to do that 
with FieldsMetadata.

You should configure FieldsMetadata like this: 

---------------------------------------------------------------
FieldsMetadata metada = report.createFieldsMetadata();
metadata.setEvaluateEngineOnlyForFields(true);
---------------------------------------------------------------

This configuration escape (with [escape] for Freemarker) the whole XML entries 
(word/document.xml) except the content of the mergefields.

This feature should be done by default, but I have not done that because I'm 
not sure that it works with some docx. Try it and tell me if it works with your 
case.

Regards Angelo

Original comment by angelo.z...@gmail.com on 7 Nov 2012 at 1:26

GoogleCodeExporter commented 9 years ago

Thanks for the response. 

Unfortunately, when I tried the suggested fix I got the same result. The 
following test results in both the real merge field and the fake one (ie plain 
text freemarker) being discovered:

    @Test
    public void Test01() throws IOException, XDocReportException {
        File file = new File("C:\\Error.docx");
        FileInputStream fin = new FileInputStream(file);
        IXDocReport report = XDocReportRegistry.getRegistry().loadReport(fin, TemplateEngineKind.Freemarker);

        FieldsMetadata metadata = report.createFieldsMetadata();
        metadata.setEvaluateEngineOnlyForFields(true);

        FieldsExtractor<FieldExtractor> extractor = FieldsExtractor.create();
        report.extractFields(extractor);
        List<FieldExtractor> fields = extractor.getFields();
        for (FieldExtractor fieldExtractor : fields) {
            System.out.println("fieldExtractor=" + fieldExtractor.getName());
        }
        assertEquals(1, fields.size());
    }

The output is:

    fieldExtractor=RealMergeField
    fieldExtractor=FakeMergeField

Original comment by Mr.M.McM...@googlemail.com on 7 Nov 2012 at 2:40

GoogleCodeExporter commented 9 years ago

Please attach your docx+java code. I will try to see your problem in debug mode.

many thank's

Regards Angelo

Original comment by angelo.z...@gmail.com on 7 Nov 2012 at 3:16

GoogleCodeExporter commented 9 years ago

The Error.docx is attached already and the Test01 code added in my previous 
post is what's demonstrating the problem - it's just a Junit test case.

Cheers

Mike

Original comment by Mr.M.McM...@googlemail.com on 7 Nov 2012 at 4:16

GoogleCodeExporter commented 9 years ago

Hi Mike,

Sorry I had not seen that you had attached your Error.docx. 

I have fixed the problem for 1.0.0 (see 
http://code.google.com/p/xdocreport/source/detail?r=f3a20f8a2619c8a046b6fb48d0ec
03edc3780de2)

The problem was that preprocessing when 

---------------------------------------
report.extractFields(extractor);
---------------------------------------

is called. My fix is to call preprocessing step when extractFields is called 
(the preprocesing step add for instance [noparse] between mergefield if 
metadata.setEvaluateEngineOnlyForFields(true) is called).

If you want use 0.9.8, you must generate a report (one time) before calling 
report.extractFields(extractor);

Regards Angelo

Original comment by angelo.z...@gmail.com on 8 Nov 2012 at 8:21

Changed state: Fixed

Kogie / xdocreport

FieldExtractor incorrectly identifies fields in free text #186