GenomicsStandardsConsortium / mixs

Minimum Information about any (X) Sequence” (MIxS) specification
https://w3id.org/mixs
Creative Commons Zero v1.0 Universal
36 stars 21 forks source link

Terms: semi-automatic check on what terms are missing in v6 #716

Open Woolly-at-EBI opened 1 year ago

Woolly-at-EBI commented 1 year ago

I have automatically compared terms in ENA and MIXSv6 (and MIXSv5 to a limited extent), including using the v6 linkml. Happy to share some of the output if it is useful. Did exact matches, minor cleaning matches and fuzzy matches(using RapidFuzz: levenshtein edit distance underneath, it does have its challenges) Currently compiling a report to finally enable us in ENA to move on to change(and add) the checklists to be v6 compliant.

what I could do is compare the terms and generate a comparison "triplet" for left_term vs right_term "matches", and annotation of the edge: MIXSv5 vs MIXSv6 MIXSv6 vs itself. (would also show some ) ENA vs MIXSv6

(it is a little fun and games as in ENA we use the long name(=title), whereas MIXS uses the short_name)