PASTAplus / PEP

PASTA Enhancement Proposals (PEPs)
Apache License 2.0
0 stars 1 forks source link

PEP-4: General Discussion #8

Open clnsmth opened 1 month ago

clnsmth commented 1 month ago

During our recent meeting, we discussed PEP-4 and raised several important points that are summarized here for further consideration:

Item 1: Representation of Date and Time Components

Item 2: Promoting Automation in Data Reading

Item 3: Zero-Padded Dates and Times

Item 4: Case Sensitivity in Date and Time Formats

clnsmth commented 1 month ago

Item 5: Issuing Errors for Unsupported Formats

If we expand the list of supported formats, we could consider changing the handling of unsupported formats from warnings to errors. The rational is that the expanded list covers a wide range of commonly used and unambiguous date-time formats and any remaining unsupported formats are invalid and should be rejected from publication.

clnsmth commented 1 month ago

Item 6: Distinguishing Preferred and Checked Formats

A key issue addressed by PEP-4 is the publication of date-time values in the repository that aren't checked due to unsupported formats. We've previously considered expanding the list of preferred formats to address this.

However, we could address this more directly by having two lists. One that defines the preferred formats, allowing us to maintain our focus on ISO 8601 as the preferred standard, and a second expanded list that is used when checking format-value congruence.

clnsmth commented 3 weeks ago

Preliminary Decisions on PEP-4: Expanding Supported Date and Time Formats for ECC and ezEML Congruence Checks

Below are preliminary decisions on PEP-4, with associated action items.

Standardization of Date and Time Representations

Use uppercase letters for date components (e.g., YYYY, MM, DD for year, month, day) and lowercase letters for time components (e.g., hh:mm:ss for hours, minutes, seconds), for sake of consistency.

Support additional date component separator, specifically "/", in addition to the existing "-" separator. This accommodates formatting commonly submitted to the repository, and used within the research community.

Actions

Component-Level Formatting

Represent individual date and time components (e.g., year, hour) as numeric EML AttributeType / measurementScale rather than using the dateTime type.

Actions

Best Practice Recommendation

Continue to recommend ISO 8601 in data packaging best practices.

Actions

Library for Date/Time Checking

Develop a date and time checker library to:

  1. Validate whether a date/time format is in the preferred list.
  2. Ensure congruence of date/time format with data values in data entities.

Actions

Support Automated Reading by Common Programming Languages

To facilitate programmatic data reads, and conversion between formats, we will provide mappings between EML format strings and common representations in languages like R and Python. This mapping could be included in the date and time checker library, made accessible as a web service, or provided as a resource in a PASTAplus GitHub repository.

Example:

EML format string strftime/strptime format codes
YYYY-MM-DD %Y-%m-%d %H:%M:%S

Actions

Zero-Padded Dates and Times

Zero-padding will not be required, as most programming languages can interpret these formats accurately without it. However, this may affect regex-based congruence checks, so we will verify this does not impact the PostgreSQL database used by the ECC.

Actions

Abbreviated Month Formats

Formats like dd-mon-yyyy (e.g., using abbreviated months) will not be included in the preferred list due to challenges related to supporting multiple languages.

Actions

Handling Unsupported Formats

Formats outside of the newly expanded list of preferred formats will continue to be met with a warning. This will allow valid formats, not in the expanded list (due to oversight), to enter the repository.

Actions

Seek Community Feedback

We will seek community feedback, to review and help finalize these recommendations.

Actions