Open ifohancroft opened 2 years ago
@ifohancroft, only about point 2: From the perspective of using the inspector to harvest all of the data, and in a second step to pick the one of interest (either by a spreadsheet, or by an AWK script), I would prefer the three-column approach over the second option because it is easier to stipulate a keyword based identification of a line of interest in AWK and advance a few columns, than the combination 1) search for a line with the keyword, 2) but fetch the information from the next line (or record, in AWK).
However, because the space already separates the first from the second column, it were advantageous if the colon between the second and third column would be dropped altogether. This suggestion is influenced either from AWK (explicit space/tabulator between the columns is the [adjustable] assumption), or the copy-paste of .csv data into spread sheet programs. Then, two patterns come to mind:
a) only remove the colon as column separator
[PDF] Producer Adobe PDF library 15.00
[PDF] Title
[PDF] PageCount 158
[XMP-x] XMPToolkit Adobe XMP Core 5.6-c017 91.164464, 2020/06/15-10:20:05
[XMP-xmp] ModifyDate 2022:02:11 11:19:53+08:00
[XMP-xmp] CreateDate 2022:02:11 11:19:53+08:00
[XMP-xmp] MetadataDate 2022:02:11 11:19:53+08:00
[XMP-xmp] CreatorTool Adobe Illustrator 25.2 (Windows)
b) drop the colon as column separator .and. group items sharing the first entry (e.g., [PDF]
, [XMP-xmp]
)
[PDF] Producer Adobe PDF library 15.00
[PDF] Title
[PDF] PageCount 158
[XMP-x] XMPToolkit Adobe XMP Core 5.6-c017 91.164464, 2020/06/15-10:20:05
[XMP-xmp] ModifyDate 2022:02:11 11:19:53+08:00
[XMP-xmp] CreateDate 2022:02:11 11:19:53+08:00
[XMP-xmp] MetadataDate 2022:02:11 11:19:53+08:00
[XMP-xmp] CreatorTool Adobe Illustrator 25.2 (Windows)
My speculation, variant a) were be easier to implement. Variant b) possibly were visually easier to access if there were some convention about the .pdf format stating what parameter (second column) belongs to which group (first column). Maybe there is such an agreement/standard covering this part of the .pdf file format.
@nbehrnd Not sure why I never replied. Sorry. I agree. Your point about making it easier to find what you want with a script by preserving the three column approach, but dropping the colon as a separator and grouping items that share the first entry to also make it easier to visually read makes sense.
I'm not sure I understand what you mean about there being a standard stating which field of second column belong to which field in the first column? I think I do and there is, but I don't think it's needed to implement the sorting. Since the columns are already filled I guess you can just sort by first column to do the grouping.
I'd like to propose a discussion about a couple of PDFMtEd Inspector features:
Since PDFMtEd as a whole, uses XMP-dc tags and not XMP-pdf tags, shouldn't the inspector stop showing values for various XMP-pdf tags?
The output doesn't seem very clean and clear. Instead of showing like:
Shouldn't the output be changed to something more readable, like:
The second example doesn't matches the fields from the example above, but you get the idea