datacleaner / DataCleaner

The premier open source Data Quality solution
GNU Lesser General Public License v3.0
599 stars 181 forks source link

XML XPATH improvement #1342

Open markansink opened 8 years ago

markansink commented 8 years ago

For a customer I'm created in the conf.xml a xml datastore with multiple table-def's and I'm using the 'index(path)' to refer the parent node's rowid which is generated by DataCleaner.

I would like to have the ability to refer to the id field of a parent node, so I don't be bother by the technical key and can use the key/id from the file to make the reference.

kaspersorensen commented 8 years ago

I'm not sure I understand what you're saying here TBH. Can you maybe give a simple example?

I think what you're maybe looking for is a way to join the two tables? In such case, take a look at the Table Lookup component.

markansink commented 8 years ago

In xpath you can use ../../something to refer to a higher node. E.g.

<Person id="10183" action="add" date="19-Sep-2013">
            <Gender>Male</Gender>
            <ActiveStatus>Active</ActiveStatus>
            <Deceased>Yes</Deceased>
            <NameDetails>
                <Name NameType="Primary Name">
                    <NameValue>
                        <FirstName>Ange-Félix</FirstName>
                        <Surname>Patassé</Surname>
                    </NameValue>
                </Name>
                <Name NameType="Spelling Variation">
                    <NameValue>
                        <FirstName>Ange Félix</FirstName>
                        <Surname>Patassé</Surname>
                    </NameValue>
                    <NameValue>
                        <FirstName>Ange Felix</FirstName>
                        <Surname>Patasse</Surname>
                    </NameValue>
                    <NameValue>
                        <FirstName>Ange-Felix</FirstName>
                        <Surname>Patasse</Surname>
                    </NameValue>
                </Name>
            </NameDetails>

So in the table def I now have this:

<table-def>
    <rowXpath>/PFA/Records/Person/NameDetails/Name/NameValue</rowXpath>
    <valueXpath>index(/PFA/Records/Person/NameDetails/Name)</valueXpath>
    <valueXpath>/PFA/Records/Person/NameDetails/Name/NameValue/TitleHonorific</valueXpath>
    <valueXpath>/PFA/Records/Person/NameDetails/Name/NameValue/FirstName</valueXpath>
    <valueXpath>/PFA/Records/Person/NameDetails/Name/NameValue/MiddleName</valueXpath>
    <valueXpath>/PFA/Records/Person/NameDetails/Name/NameValue/Surname</valueXpath>
    <valueXpath>/PFA/Records/Person/NameDetails/Name/NameValue/MaidenName</valueXpath>
    <valueXpath>/PFA/Records/Person/NameDetails/Name/NameValue/Suffix</valueXpath>
    <valueXpath>/PFA/Records/Person/NameDetails/Name/NameValue/EntityName</valueXpath>
    <valueXpath>/PFA/Records/Person/NameDetails/Name/NameValue/SingleStringName</valueXpath>
    <valueXpath>/PFA/Records/Person/NameDetails/Name/NameValue/OriginalScriptName</valueXpath>
</table-def>

I would like to refer to the

Person id="10183"

I tried to use valueXpath>../../../.../Person@id</valueXpath

But this returns 'null'

I will use the surrogate key to join the tables together, but I would like to use the 'functional' id as a sanity check. Also from a business user perspective he or she is only aware of the functional id and doesn't know the surrogate key.

LosD commented 8 years ago

Doesn't <valueXpath>/PFA/Records/Person@id</valueXpath> work? The row was already selected by the rowXpath, so I would think it should choose the correct one?

LosD commented 8 years ago

(I'm totally green regarding XPath/XML use inside MM/DC though, so it wouldn't surprise me if I was wrong)

markansink commented 8 years ago

when using <valueXpath>/PFA/Records/Person@id</valueXpath> Only the first node get's the id and not all nodes see: image

LosD commented 8 years ago

Hmmm, that's pretty weird. I think I'll have to read up on MetaModel's tabledef to understand how that can happen.

I imagined that it would work perfectly or not at all; from the point of XPath I can't see why there should be any difference if it's the first or second child node.