However, further hints could be included, such as common MIME types or magic bits. This idea needs a bit of planning work.
See also here:
This is my usecase. If someone uploads an arbitrary file to my ELN and I have a whole registry of tools to process it, the ELN still needs to figure out which tool to use. Identifying the FileType would give you the connection. Otherwise, I need to rely on the source (e.g. user) to tell me the type.
To apply a tool, the ELN needs to figure out the FileType one way or another. This is why you ask for a FileType identifier, right? Maybe it is difficult, but If you agree that it is a valid use-case, why wait for the next MaRDA WG to figure it out? I am not sure how additional information would reduce the useful-ness.
Let's say we are not using the registry to identify FileTypes. The tools in the registry still need to somehow tell what their intended input FileType is. And it ought to be more specific than JSON, HDF5, csv, etc. Why not describe the FileType by characteristics that would help to identify a file's type?
This issue is a follow-up of https://github.com/marda-alliance/metadata_extractors_schema/issues/45.
In https://github.com/marda-alliance/metadata_extractors_schema/pull/48, we have implemented the
associated_file_extensions
slot in theFileType
schema, to specify some metadata that can be used to match files toFileTypes
.However, further hints could be included, such as common MIME types or magic bits. This idea needs a bit of planning work.
See also here:
_Originally posted by @markus1978 in https://github.com/marda-alliance/metadata_extractors_schema/issues/9#issuecomment-1403190937_