OP-TED / ted-rdf-mapping-eforms

TED-RDF Mapping Suites for eForms Notices
European Union Public License 1.2
2 stars 0 forks source link

missing time:TemporalUnit for epo:SpecificDuration #40

Closed cristianvasquez closed 2 months ago

cristianvasquez commented 2 months ago

Hello,

I've found a epo:SpecificDuration without a time:TemporalUnit defined

image

I had a peek into the XML, and the time unit appears to be there.

<cac:PlannedPeriod>
    <cbc:StartDate>2024-12-02+01:00</cbc:StartDate>
    <cbc:StartTime>00:00:00+01:00</cbc:StartTime>
    <cbc:DurationMeasure unitCode="CALENDAR_DAY">1826</cbc:DurationMeasure>
</cac:PlannedPeriod>

Source XML: https://ted.europa.eu/en/notice/163529-2024/xml

schivmeister commented 2 months ago

This perplexed me for a while initially, until I realized that the unitCode value is CALENDAR_DAY. Unless there's something else I'm missing, the parent codelist for duration-unit has the following codes:

Code Description
DAY Day
MONTH Month
UNKNOWN Unknown
UNLIMITED Unlimited
WEEK Week
YEAR Year

While the upstream Authority Table (AT) does appear to have the code CALENDAR_DAY, the eForms SDK does not.

Under normal circumstances, this would not have caused any issue, as we would've used the upstream vocabulary. However, because in this case we map to an ontology term (from time), we used the SDK as the canonical reference, and created a custom mapping table according to the SDK codelist.

Please confirm whether or not we should then add the rest of the codes that have a time: equivalent, or if instead the SDK has to be updated, and this is then not a bug at this time.

cristianvasquez commented 2 months ago

Hi,

I've gathered some information and the code list of the SDK will always be a subset of the authority tables values, honouring the 'Code' value.

I believe we can be liberal in what we accept, meaning we can have an augmented mapping in the custom mapping that handles things such as CALENDAR_DAY, WORKING_DAY, YEAR_HALF, at least the elements that exist in time

I'm looking for the time-mapping, perhaps it already exists?

schivmeister commented 2 months ago

We would gladly add more codes in the custom mapping, but unfortunately there is no equivalent to those you cite in the Time ontology. There are no instances of the time:TemporalUnit class to represent CALENDAR_DAY, WORKING_DAY, YEAR_HALF, etc. Neither the SDK nor ePO foresees such codes.

We can of course extend our mapping table to map CALENDAR_DAY to time:unitDay, if requested.

However, for the other values from the AT, which are not permitted values in the SDK, we would need to handle them in special ways that would not be straightforward (e.g. YEAR_HALF could be represented as a time:numericDuration 6 and time:unitType time:unitMonth).

I'm looking for the time-mapping, perhaps it already exists?

Can you clarify what you mean by this?

cristianvasquez commented 2 months ago

time-mapping was my unfortunate choice of words to refer to the complete mapping of at-voc:timeperiod to the time ontology. :)

address this comprehensively, I believe we need further analysis and the concrete mapping of all the values

I opened a new related issue in ePO repository for further discussion: https://github.com/OP-TED/ePO/issues/681

Regarding this specific issue, I would extend the mapping table to include all we can that is obvious.

Regarding the CALENDAR_DAY description, I notice the key distinction from DAY is that it specifically starts at midnight, although the duration is identical. For this, I would suggest using time:unitDay.

Do you foresee other values to add?

schivmeister commented 2 months ago

time-mapping was my unfortunate choice of words to refer to the complete mapping of at-voc:timeperiod to the time ontology. :)

I see now. There is no mapping for that one as we simply translate to a voc value, not an ontology class. This is the voc resource file, which is integrated into the RML here. These resources are produced by querying the official endpoint, which we track in the pipeline.

I opened a new related issue in ePO repository for further discussion: https://github.com/OP-TED/ePO/issues/681

Thanks for reporting this in detail incl. digging up the related tickets.

Regarding this specific issue, I would extend the mapping table to include all we can that is obvious.

Regarding the CALENDAR_DAY description, I notice the key distinction from DAY is that it specifically starts at midnight, although the duration is identical. For this, I would suggest using time:unitDay.

Do you foresee other values to add?

According to the list of instances for time:TemporalUnit we can only map the HOUR (time:unitHour), MIN (time:unitMinute) and SEC (time:unitSecond), which we already do (albeit mistakenly lowercase).

Beyond time:TemporalUnit, we can map the days (time:DayOfWeek) and months (time:MonthOfYear), but these are not relevant for the data we are looking at. I don't know if we have a use case for them, but let us know if they are still preferable for inclusion nonetheless.

cristianvasquez commented 2 months ago

Thank you for looking up the values, is appreciated!

Probably is safe to keep the instances of Day and Months out, ePO only references the time ontology through epo:Duration, which represents a duration rather than a specific point in time, such as 'February.', We'll revisit the topic probably with ePO 5.0

Let's adjust the case and add time:unitDay to the mapping and feel free to close the issue with the commit

cristianvasquez commented 2 months ago

Upon reconsideration, it seems that for example epo:Price might include an epo:hasExpectedDeliveryTime. I haven't seen a full overview of the source data, so I'm unable to make a fully informed assessment. I can imagine cases where days or months are used.

Probably adding Months and Days to the custom mapping would be prudent and wouldn't cause any issues.

Let's adjust the case and add time:unitDay, months and days (time:TemporalUnit) and days (time:DayOfWeek) to the mapping and feel free to close the issue with the commit

csnyulas commented 2 months ago

I don't understand your latest suggestion @cristianvasquez.

I think there is a big confusion here. And it is likely caused by the fact that the at-voc:timeperiod Authority Table is used to provide codes for multiple purposes.

On the one hand, the values in the at-voc:timeperiod AT, which are extracted in this JSON file timeperiod.json, are used in the tedm:timeperiod TriplesMap to map the values of the epo:hasTimePeriod property of an epo:Period instance, when mapping the "BT-538-Lot" field: https://github.com/OP-TED/ted-rdf-mapping-eforms/blob/5677aa7eb34422b24581538ade29b37730e77876/mappings/package_cn_v1.8/transformation/mappings/Lot.rml.ttl#L4639

That is totally different from what this issue is about.

This issue is about the mapping of the values of the time:unitType property of the epo:SpecificDuration instance, in particular in the context of the mappings of the "BT-36-Lot" and "BT-98-Lot" fields. See relevant code snippets here. This is basically the second use of the at-voc:timeperiod AT, which is independent from the one described above.

For this use case we have a separate mapping table, temporal_unit.csv (which was slightly changed for the mappings of earlier SDK versions), that maps the values that can appear in the duration-unit code list, to instances of time:TemporalUnit class from the time ontology.

According to the eForms SDK, the duration-unit code list only contains the following values:

Code Description
DAY Day
MONTH Month
WEEK Week
YEAR Year

The fact that the notice that you encountered contains CALENDAR_DAY as a value for the unitCode attribute, is obviously a mistake and it is not valid according to eForms SDK. Perhaps it slipped in because the eForms validators for the SDK v1.8 were not complete enough to capture such mistakes. However, if we look at the definition of DAY and CALENDAR_DAY in the AT, we can see that both refer to a duration of 24h. So, if interpreted as a duration unit, the CALENDAR_DAY could be represented by time:unitDay.

However, the majority of other values in the at-voc:timeperiod AT represent either (partially defined) time periods (e.g. JAN - a period starting from January 1st and ending on January 31st, which needs to be combined with a specific year), or durations (e.g. HALF_YEAR, a duration defined by the value 6 and the duration unit MONTH).

The only other values in the at-voc:timeperiod AT that represent duration units are HOUR, MIN and SEC. We mentioned them in our mapping table as they have corresponding time:TemporalUnit instances, but we used placeholders, as they are not valid code values according to the SDK. Obviously, it does not make sense to talk about durations of the orders of hours, minutes or seconds in the context of a procurement notice.

In principle WORKING_DAY could also be used as a duration unit, but that has no equivalent in the time ontology, and there is no straightforward conversion of an amount of WORKING_DAYs into DAYs that can be done automatically (especially during the XML to RDF transformation). Such a conversion depends on the starting date of the period and the country where the work is being executed, among other things. This might well be the reason why it wasn't included in the duration-unit code list.

So, as far as I see it, the only potentially useful, and arguably not a completely incorrect, modification would be to add the mapping of CALENDAR_DAY to time:unitDay to the mapping table. Do you agree?

cristianvasquez commented 2 months ago

Thank you for the detailed explanation; it clarifies the distinction between the two uses of the at-voc:timeperiod Authority Table. This helps with its analysis.

Mapping CALENDAR_DAY to time:unitDay seems reasonable and fits within the scope of this issue. (sorry for expanding it)

As for WORKING_DAY, we can address it in a separate issue if it comes up as a duration in other examples.