Open brianthomas opened 9 years ago
I just wouldn't say "ASCII" though. Modern text editors can handle unicode.
This may be a diversion...but I was thinking of how XML has a processing instruction up on the first line which advertising the encoding, e.g.
<?xml version="1.1" encoding="UTF-16"?>
For example. The processing instruction only uses charcters from ASCII so you are always safe in knowing that you can read that first line and know what the rest of the file will contain.
I'd actually considered a similar use case something on the lines of "an astronomer has a data file and wants a really quick look at what it contains without having to run any specialised software"; I certainly often use "more file.fits" as a quick way of looking at the primary header of a FITS file and I don't think I'd be the only one who does this or something similar. I think this sounds quite similar to the use case Brian was considering,
What I was thinking of in this case - and I probably wasn't as specific as I might have been, because I was trying to keep from drifting into requirements - was more the case where you really want to work with the data 'properly' as opposed to just have a quick look, but you can no longer get the now-outdated 'standard' access software to run. In which case you have to go back to the documentation of the format and roll your own. One would hope that with a successful format, the standard access software would be maintained, but I thought it was worth having this case in, if only because a perceived strength of FITS is that its basically simple format makes rolling your own something one could imagine doing. (Note the 'perceived' there.)
I was originally assuming that the sort of requirement that would follow from this was that at a bare minimum the format would be documented at the byte level rather than just through an API. And the same would go for how the data model was implemented in terms of, say, named items in the structure. "Simple and uses ASCII/unicode" are requirements that might follow from the 'quick look' use case I mentioned at the start. ("Simple" is good anyway, but probably finds itself at odds with "flexible".)
What I'm starting to appreciate is the extent to which serialisation is a separate issue to the file layout - and that there might not necessarily be a file layout at all (the data base option, or even to some extent the CASA 'a "file" is actually a directory' option).
I'm personally inclined to see this and Brian's case as two separate use cases, one about being able to recover science from archived data and one about being able to use quick look standard utilities to look at a serialised file. (There may even be a bit of overlap with Mark's case 8 on language support, although I read that as being able to add an API for a new language, possibly/probably on top of an existing API in something like C++, rather than a roll your own case.)
@KeithShortridge, it sounds like your intention and mine are quite similar here. I my use case 8 I certainly have in mind, as at least one example of this, that for whatever reason you want to be able to access existing serialised data without having to go through an existing standard or official access library. One reason you might want to do that is the standard library becoming unsupported; another is use from a language which does not talk nicely to the language in which the standard access library is implemented (e.g. C from javascript or java). Again I was trying to avoid straying into requirements territory, but maybe ended up being insufficiently explicit.
Like you, I believe documentation of the file format [if there is a file format] at byte level is one, er, requirement to support this usage, but it's not sufficient - the format also has to be sufficiently simple that it is actually feasible to re-implement from scratch, which does not necessarily follow.
Perhaps one or other of us should re-draft our use cases to make it clearer what we've got in mind here - though as you say, it's hard to do it without talking about requirements. Brian?
I think we learn as we go about how best to execute the project. I still think that in general we should try to keep to a process of use case-requirements-vote-write paper. In this case I think its OK to talk requirements particularly since you want to use it to firm up a common use-case.
OK I have provisionally added a paragraph to use case #8. But if others feel this concern is better handled by amending #11 instead, or adding a new use case, I could retract that.
I think it's a question of the extent to which we want to minimise the number of use cases. It seems to me there are three use cases here which are clearly related but nonetheless subtly different - Mark's #8, my #11 and Brian's intended one about viewing serialised data with common tools (assuming I'm not misrepresenting that). We could try to combine them in the interests of minimising cases, which I'd be happy to attempt if that's what people would like to see, or we could leave them separate in the interests of catching the whole gamut of cases. Leaving them separate might give the impression of weighting the exercise towards one arguably peripheral aspect of things. Brian - do you have a feeling about how many use cases you were looking for?
I think we can leave off too much editing of use cases. At this point, I think its better to emphasize generation of ideas/use cases. In general, unless there is near 100% duplication, I'd prefer to err on the side of inclusiveness for use cases allowing for us to capture subtle points, possibly. I've imagined that we'd have overlap here, and that this would be resolved at the stage that we try to extract requirements. Its quite possible that any particular requirement may trace back to more than one use case, and this is OK, IMO. Its also possible that sub-requirements may only occur in some of the use cases which generated the parent requirement. When we get to the voting stage (on requirements), anything which is an 'edge case' (less important) requirement should naturally be dropped out.
So..given the above comment, I wonder if I should create a new use case or add in my thoughts to the text? Since you own this use case Keith, let me know your desire here.
(Sorry, I seem to have a very slow response time here!) I agree completely with your above comment, and it seems to me that given that it's simplest if you add a new use case.
OK, I've added "Use Case 13: Basic access to serializations of the data format"
Hah! I just sat down this morning to add a use case about being able to read and understand the serialization of the format without the standard library and there it was. Thanks for adding this.
I wonder if a little more detail might add to this use case?
For example, what about adding in a bit about using a standard editor for the OS like vi, emacs or textedit or the like? We may need to be very careful in what we specify however. Common software has to be able to reveal to the user at least what the encoding of the file/serialization is, otherwise the user will be in trouble. Similarly, you don't want to binary encode the whole file or you are similarly lost. This all seems to imply that there should always be a minimal header which is in ASCII or the like. How much of this is derive-able as requirements and how much make sense to add to your example?