clarin-eric / standards

work space for the Standards and Interoperability Committee
https://www.clarin.eu/content/standards
3 stars 13 forks source link

Interoperability with other format descriptions #56

Closed hannahedeland closed 2 months ago

hannahedeland commented 2 years ago

We could maybe just focus on LRT specific formats and reuse existing sources like the Format Definition Documents of the Library of Congress (cf. https://www.loc.gov/preservation/digital/formats/fdd/fdd_xml_info.shtml) for more generic formats.

FAIRSharing.org also includes some relevant "community standards", though it's not as transparent and the new version is currently beta, so this would probably rather be something for the other direction, at a later stage.

bansp commented 2 years ago

Thanks! It would be very sensible indeed to use the LoC for the outside of LRT, and FAIRSharing for, say, outreach. Keeping to our assumptions (scope on CLARIN and bottom-up reports, medium granularity), we'd probably just use a link to the relevant bit of the LoC inside a skeleton of a format-description file (which needs to specify the ID, the mimeType, and the extension, minimally). I think that should help a lot, indeed, because otherwise coping with all those 'outside' formats could become a nightmare.

bansp commented 2 years ago

Added a test AIFF file in d1f45a3a2683ec452711a20e670d730c0896e8f7 Observations: several fields need custom values (ID, name, etc.), but these are straightforward. Keywords are a bit less straightforward, but we need to open a task for unifying them anyway. Tried to take the MIME and ext values from the spreadsheet, and saw that the spreadsheet MIME uses -x-, while a non-x value is defined (audio/aiff), and it uses .aiff for extension while the LoC summary says .aif is the most widespread. So there's still stuff to align, but that's what was to be expected. After all, we need a larger infrastructure context to report MIME and ext for "locally normative" purposes.

hannahedeland commented 2 years ago

Great, so I'll give you the next one then ;) https://www.nationalarchives.gov.uk/PRONOM/Format/proFormatSearch.aspx?status=detailReport&id=1192 And honestly I don't know what information the spreadsheet is based on, if it's bottom-up/representative when it comes to ext etc. and whether it can be considered a recommendation in these areas, I think we'll have to look at that later again anyway? (And I'm doing the other A/V format description stubs, right?)

bansp commented 2 years ago

Nice :-) We can always use two links, if that feels useful. And yes, I think the mediaTypes is a subproject on its own (as are the statistics that we could come up with), while as far as extensions are concerned, it is a bit hard to imagine that centres might want have their own flavours for that (or even that the extensions matter a lot). Thanks for your willingness to add the stubs, that is going to help a lot (and, haha, keywords are going to be a little subproject too, I'm afraid).

bansp commented 2 years ago

Please commit to the formats branch, OK? There's an active PR set there, for Eliza to be able to see it all at once.

hannahedeland commented 2 years ago

Please commit to the formats branch, OK? There's an active PR set there, for Eliza to be able to see it all at once.

The recommendations too or just the format stubs?

bansp commented 2 years ago

That was for whatever, although Eliza has since merged that PR ;-) But formats is a good target anyway, or you can set up another branch -- take the most convenient way out.

bansp commented 2 years ago

We can probably close this one (because it has a concrete task ticket now). Fairsharing has not been handled yet; indeed, it may be a candidate for "the other way round".

bansp commented 1 year ago

Aaaand, thanks to @hannahedeland 's suggestion above, we now even have a Fairsharing ID... :-) https://fairsharing.org/4705

bansp commented 10 months ago

With respect to Hanna's opening note, I think that this is a wontfix, for at least two reasons:

bansp commented 2 months ago

Well, not really a wontfix, because the issue has served its purpose. Closing with thanks.