Open opattison opened 10 years ago
This is essential. I was also thinking about meta-data standards and stuff like that.
This is pretty quickly going to be it's own document. Should this be a wiki?
It could be a wiki. That might be a way to handle proposed changes, but I haven't used the GitHub wiki tool much so I don't know its strengths and limitations.
Yeah I'm not sure. I've asked someone I know who's involved with ISO meta-data standards stuff to weigh in so we'll see how that plays out. We can use one doc until it bursts at the seems.
I might start adding some reference to other principles as an example in the next few days.
A reference point I found from @tmcw: http://simpleopendata.com
Consider GeoJSON, KML and CSV as open data formats to recommend.
The real wizard of archives is @straup if you can summon him. Archives tend to expand beyond data, so you'd probably have (TIFF/jpg/png) for images - TIFFs are dinosaurs but still very universal. Audio has some pre-existing resources: http://www.archives.gov/records-mgmt/initiatives/dav-faq.html - the big ones for lossless are WAV & AIFF, which despite being microsoft/apple-derived, are usually supported cross-platform.
Thanks @tmcw! I'm a big fan of @straup's work. I think I'm going to dig through the http://www.aaronland.info/weblog/ archives and look for old gems.
Thanks guys, do you think there is a list of good archive formats for different media maintained somewhere? Surely?
Also, Cassie Findlay has made an interesting response over on the post http://equivalentideas.com/journal/approaching-principles-for-independent-archives/#disqus_thread
Meta data standards are another important angle.
Hi all - If you want to really go deep on file format stuff, in our digital archives at the State archives work we make use of the PRONOM file format registry http://www.nationalarchives.gov.uk/PRONOM/Default.aspx maintained by the National Archives UK to assess the suitability of file objects for long term preservation and identify good pathways for creating copies. The other people who are good on file formats are JISC: http://www.jiscdigitalmedia.ac.uk/guide/file-formats-and-compression/ For metadata we are mainly using JSON but we accept metadata dumps in CSV, XML.
@CassPF Thanks for the JISC link – that will be useful as an explainer for people who haven't heard of some of these terms before. Even a term like "raster" could cause a problem for a non-technical audience. There are quite a few items from that guide that I hadn't understood as well yet.
How deep do people think we should go with recommending formats, and is this the best place to do that? I can imagine a list of formats or succinct recommendations for approaching for different situations (text, images, video, audio, websites). I think keeping content on technical content on this end minimal and linking out to these resources and others.
I think a primer on it with some links to other deeper reading could be the way to go. Leaving out technical details sounds fine.
Ok so I think this is a good structure @opattison:
Is this the type of thing that we should do in a dev branch? At this stage I'd be up for just working stuff out in the master branch. Not sure.
I agree. Dev branch for now. Let me know if you need any help editing.
The first principle listed is 'Use open standards'. What are some examples of long-lived formats? I ask this with many hesitations, given that we have only a relatively short and unsuccessful technological history of storing things for the far future (especially digitally).
Possibilities for future-friendly formats:
What else? This list is just a brief start, of course. It might be good to provide a couple of examples of non-proprietary, open formats in the Principles list.