Closed berry120 closed 3 years ago
Thanks for the PR! Well done.
Two comments:
PageRectangleRegion
qualifier if we've defined both the PageRegion
and the RectangleRegion
? My inclination would be to just have two qualifiers on the source reference (a page and a rectangle region).Region
redundant in PageRegion
? I wonder if we could just say Page
? I feel like the word "Region" isn't redundant in the other qualifiers because they're each specifying some kind of a "start" and "end" boundary. But "page" has boundaries by definition, no?Agree with both of those points - I've updated the PR to match.
(Initially I missed that the qualifiers were always specified as a list, rather than individually, hence the addition of PageRectangleRegion
.)
When a multipage document has pages that are not numbered, an absolute page number is the way to go. And certainly, an absolute page number is easy for a computer to semantically interpret. But I worry about human interpretation of the absolute page number when the multipage document has been numbered for human consumption. When a reference for page 1 of a book that has 20+ pages of preface material numbered in Roman numerals is consumed by a human, the human will look for page 1 after the Roman numbered pages. So my question is: could we allow for both an absolute page number and a document-specific page number?
@thomast73 It's a fair point, I wondered about this too, and the emphasis we place on the raw file being human readable (as oppose to an application using it being human readable.)
I decided against it here because I think it introduces a lot of complexity for little gain. We'd either have to have the qualifier differentiate between "raw" and "labelled" page numbers dynamically (which doesn't seem too reliable so I don't think that would be the best idea), introduce a more complicated syntax to the qualifier to differentiate between the two somehow, or introduce a separate qualifier for "raw" and "labelled" page numbers completely (and make them mutually exclusive.)
If we go with "absolute" page numbers in the spec, then the only real disadvantage is that it's not immediately clear to someone reading the raw file. I'd say this is quite rare though - by far the most common use case is going to involve a user viewing the data through some kind of backend processing & presentation layer, which would easily be capable of taking the raw page number, looking up the "labelled" page number if different, and then showing that to the user instead.
So in short I think always defaulting to the "absolute" page number is the better thing to do on account of being both simpler in the spec, and still enabling an application to show the "absolute" or "labelled" page number to the user as it sees fit. I'm very open to be challenged on any of the above if I'm wrong however - my only firm point would be that we definitely need to define it unambiguously!
@berry120 Perhaps we should consider a qualifier name that is less ambiguous, and have it carry than meaning you have defined. This would leave the door open for another qualifier in the future and would also help in making your proposal more specific? AbsolutePage
? I'm not in love with that name, but would prefer it over "raw...".
Although @berry120 makes a good case for the "absolute" page being assumed/default. It seems reasonable that we could just keep the simple name Page
for the assumed/default case and if we ever add support for something other than the "absolute" page, we can make the qualifier name for that new thing more specific and descriptive.
Personally I'm not too hung up on the name, whether that's Page
, AbsolutePage
, RawPage
or whatever else - happy to just go with the consensus on that one 👍
@stoicflame Just wanted to check if there's any more thoughts on the page name or anything else that needs to happen before this is ready for merge?
I think we got about as much feedback as we're going to get. Let's merge!
I'm looking at a project where I may make reasonably heavy use of interlinking different sources, and what first drew me to this format was the abiltiy to specify "regions" in a source which is fantastic - I haven't found anywhere else that allows that out of the box.
I wonder however if you'd consider making page-aware qualifiers part of the standard (since some types, such as PDF, naturally span multiple pages):
PageRegion
would be great to specify a whole individual page in a document;PageRectangleRegion
would be great to specify both a page, and a rectangle on that page in the same way asRectangleRegion
works currently.Just for completeness, the specific use case here is digitally referencing a whole bunch of family documents / photos and relating them to each other. I have detailed scans of things such as family photo albums - but I have separate scans of the individual photos in them as well as separate scans of each page (this is important as the pages themselves sometimes contain annotations outside of photos, and I'd like to also preserve the "look" of the original album.) Currently the plan is to relate them in gedcomx by having a PDF of the album at a page level, and then relating the individual images to this PDF by pointing the
componentOf
field in theSourceDescription
of each individual image to the main PDF - but qualifying the position of those images would then require the page level qualifiers.(I'm aware that I could just break the standard's recommendation and specify my own qualifiers anyway, but I thought this may be useful in a general case hence the Github issue. Happy to raise a PR if others think this is a good idea.)