Open joncison opened 4 years ago
@matuskalas - a small detail - do we give e.g. ".txt" "txt" or both ? (prob. both?)
@albangaignard for my first foray in SPARQL, I'm tackling this query, which addresses (from above):
file_extension
in EDAM must be given in lower casebut I notice that the pattern for the file_extension
property currently allows the use of |
(pipe) as delimiter between multiple values, e..g yaml|yml
.
While this is compact / looks nice, it rather complicates the semantics and downstream uses: file_extension
currently means "A string in which one or more commonly used file extensions for a data format are delimited by pipe character(s)." rather than simply "A commonly used file extension for a data format."
I think @matuskalas the right course is to refactor EDAM so that one extension is given per file_extension
? In which case the query becomes:
file_extension
in EDAM must be contain lower-case alphanumeric characters only.Thoughts please!
cc @hmenager @veitveit
PS. @albangaignard my hunch is that most or all the checks will require some Python programming, so your suggestion to use Jupyter notebooks is a very good one!
UPDATE
I just finished the query, taking the decision that only lowercase alphanumeric characters are allowed in EDAM Format file extensions. cc @matuskalas @veitveit
This being my first foray into Python and SPARQL in case you have time @albangaignard @hmenager or @hansioan I'd much appreciate some feedback on the quality of the code, which is included here (from this Juypter notebook).
From https://github.com/edamontology/edamontology/issues/421:
file_extension
in EDAM must be given in lower casefile_extension
value also appears inhasExactSynonym
(and preserving the capitalisation variants, e.g. all uppercase - where these are the "canoncical" variant in use)