PASTAplus / seo

Generate schema.org metadata from PASTA+ data package metadata
Apache License 2.0
3 stars 1 forks source link

Default `encodingFormat` to Prevent Google Harvest Warnings #31

Closed clnsmth closed 1 month ago

clnsmth commented 2 months ago

Introduction

Recent changes to data package landing pages involving Schema.org markup adoption from Science On Schema.Org (SOSO) v1.3.2 (see https://github.com/PASTAplus/seo/issues/7) introduced distribution properties describing data download methods. We have since encountered occasional warnings from Google page harvests for missing encodingFormat in some data entity distributions.

The Issue

Missing encodingFormat in distribution is non-critical, but items might lack features or be less optimized for search. While non-critical, the accumulation of warnings may obscure other more important warnings.

Addressing the Issue

The preferred solution is outlined in https://github.com/clnsmth/soso/issues/198. We've chosen to additionally implement a default encodingFormat of application/octet-stream to ensure it's always present.

Implementation Details

Benefits

Considerations

clnsmth commented 2 months ago

The recent commit 0631e98d3ec064a157f8bfd6cbd93b1ce45ffb3f introduced a potential issue where overriding an imported class method could lead to unexpected behavior in the SEO service.

To mitigate this risk, we propose the following solution:

  1. Create local copies of the necessary properties within the seo.webapp.schema_org module.
  2. Manually construct these properties as needed.
  3. Pass the constructed properties to soso.main.convert using the kwargs parameter to overwrite them.

This approach provides more control and flexibility, reducing the dependency on the soso package's current implementation.

Long-term, we recommend the soso package adopts a factory pattern to allow for customization without directly modifying imported classes.