GenomicsStandardsConsortium / mixs

Minimum Information about any (X) Sequence” (MIxS) specification
https://w3id.org/mixs
Creative Commons Zero v1.0 Universal
33 stars 20 forks source link

write python test for terms that have non-UTF-8 characters in their attributes #759

Open turbomam opened 5 months ago

turbomam commented 5 months ago

@Woolly-at-EBI in advance of writing any Python, I have tried a few CLI approaches to find non ASCII/UTF-8 characters in mixs.yaml. But I ahven't found any yet. So maybe I'm doing something wrong.

Could you please remind me what you found and what approach you used?

turbomam commented 5 months ago

maybe

turbomam commented 5 months ago

On Nov 8, 2023, you found strange characters in viscosity's description like

"A measure of oil's resistance to gradual deformation by shear stress or tensile stress (e.g. 3.5 cp; 100 °C)"

Here's the definition of viscosity in v6.2.0. It looks like I already did some replacement of problematic characters with whitespace, since there's some double whitespace below.

viscosity:
    annotations:
      Expected_value: measurement value;measurement value
      Preferred_unit: cP at degree Celsius
    description: A measure of oil's resistance  to gradual deformation by  shear stress  or  tensile
      stress (e.g. 3.5 cp; 100   C)
    title: viscosity
    string_serialization: '{float} {unit};{float} {unit}'
    slot_uri: MIXS:0000126
turbomam commented 5 months ago

I'll locally add some of those strange characters back into mixs.yaml and search again, to make sure my methods work.

turbomam commented 4 months ago

Shoot, were did I get that definition of viscosity above?!

Maybe by expanding the linkml-source section of it's documentation page on the web?

Here it is, copied straight from src/mixs/schema/mixs.yaml in this branch (759-write-python-test-for-terms-that-have-non-utf-8-characters-in-their-attributes)

  viscosity:
    annotations:
      Expected_value: measurement value;measurement value
      Preferred_unit: cP at degree Celsius
    description: A measure of oil's resistance to gradual deformation by shear stress or tensile stress (e.g. 3.5 cp; 100 °C)
    title: viscosity
    string_serialization: '{float} {unit};{float} {unit}'
    slot_uri: MIXS:0000126
turbomam commented 4 months ago

And it doesn't look like I checked any test/ code into the branch

turbomam commented 4 months ago

poetry run python badlines.py src/mixs/schema/mixs.yaml reveals just the one line with illegal characters,

description: A measure of oil's resistance to gradual deformation by shear stress or tensile stress (e.g. 3.5 cp; 100 °C)
turbomam commented 4 months ago

badlines.py should be reformulated as a test