OP-TED / ted-rdf-mapping

Transformation rules and other artefacts for the TED Semantic Web Services
European Union Public License 1.2
8 stars 1 forks source link

F06 Mapping: Revise (and replace) all references to AGREE_TO_PUBLICATION_MAN/@PUBLICATION XPaths #207

Open csnyulas opened 2 years ago

csnyulas commented 2 years ago

There are 5 references to the AGREE_TO_PUBLICATION_MAN/@PUBLICATION XPaths in the (Master) conceptual mapping (CM) and one in the technical mapping (TM) (confidentiality.ttl). Four of these in the CM (fields V.2.2.1.0 and V.2.3.0.0) and the one in the TM are sub-elements of AWARD_CONTRACT, while one of the CM ones is a sub-elemment OBJECT_CONTRACT (field II.1.7.0). However in the XSD neither OBJECT_CONTRACT nor OBJECT_CONTRACT has a AGREE_TO_PUBLICATION_MAN sub-element (see below). Nor have we found such element in any of our sample data.

image image

Conclusion: This seems to be a wrong mapping. What should we do?

Important Note: There was previously a type in the XML element name (was "AGREE_TO_PUBGLICATION_MAN" instead of "AGREE_TO_PUBLICATION_MAN"), so be aware of this when you search mapping files.

csnyulas commented 2 years ago

The typo has been fixed in commit 2769d983 The rest needs to be addressed together with #209

csnyulas commented 2 years ago

Based on our analysis these are the XPATHs that represent the confidentiality information:

AWARD_CONTRACT/AWARDED_CONTRACT/CONTRACTORS/@PUBLICATION
AWARD_CONTRACT/AWARDED_CONTRACT/TENDERS/@PUBLICATION
AWARD_CONTRACT/AWARDED_CONTRACT/VALUES/@PUBLICATION
OBJECT_CONTRACT/OBJECT_DESCR/AC/@PUBLICATION
OBJECT_CONTRACT/VAL_RANGE_TOTAL/@PUBLICATION
OBJECT_CONTRACT/VAL_TOTAL/@PUBLICATION

We have examples for each of these in our sample data, and they seem to cover all the "Agree to publish?" questions in the PDF form. image

It is worth nothing, though, that in our XML data all these tags have the value "YES", when they occur. The question is how should we interpret when these tags are missing? How about when they have the value "NO" (if that ever occurs)? What is the default value? If the default value would be "NO", it means that by default none of these information would be published?

We decided to postpone the mapping of these XPATHS until the correct representation in ePO will be decided. We should probably discuss this in the ePO WGM.

Question/Idea: Should we perhaps look the previous versions of XSD to see where the AGREE_TO_PUBLICATION_MAN/@PUBLICATION XPATHs might been used?

muricna commented 2 years ago

I think you will find that these paths refer to the reception xml and not the publication xml. As the mappings we are currently carrying out refer to the publication xml at this point of time there should be no need to map to these paths.

muricna commented 2 years ago

I have verified these paths are used in F06, F13, F15 and F22 so a mapping is required when these forms are being mapped.

csnyulas commented 1 year ago

This type of information is also present in the F20 data, although the PDF form does not explicitly permits the specification of this information (which is similar to F22, where the "Agree to publish" checkboxes don't exist either in the PDF).

Here is the overview of ALL the XPaths to the elements with the @PUBLICATION attribute according to the XSD, for the forms we have mapped so far (the screenshot was created by filtering the content of the "Mapping Remarks" sheet in the Master conceptual mapping)

image

Note: the mapping of the XPath AWARD_CONTRACT/NO_AWARDED_CONTRACT/PROCUREMENT_DISCONTINUED/NO_DOC_EXT/@PUBLICATION which is currently made to epo:LotAwardOutcome epo:hasAdditionalNonAwardJustification rdf:langString is likely wrong. It is not clear, though, as we have no data in any of our samples that would contain the NO_DOC_EXT XML element (or it's PUBLICATION attribute). image