glutanimate / PDFMtEd

View and modify PDF metadata on Linux graphically
GNU General Public License v3.0
191 stars 22 forks source link

RFC: PDFMtEd - Inspector #27

Open ifohancroft opened 2 years ago

ifohancroft commented 2 years ago

I'd like to propose a discussion about a couple of PDFMtEd Inspector features:

  1. Since PDFMtEd as a whole, uses XMP-dc tags and not XMP-pdf tags, shouldn't the inspector stop showing values for various XMP-pdf tags?

  2. The output doesn't seem very clean and clear. Instead of showing like:

[PDF]           Producer                        : Adobe PDF library 15.00
[PDF]           Title                           : 
[PDF]           PageCount                       : 158
[XMP-x]         XMPToolkit                      : Adobe XMP Core 5.6-c017 91.164464, 2020/06/15-10:20:05
[XMP-xmp]       ModifyDate                      : 2022:02:11 11:19:53+08:00
[XMP-xmp]       CreateDate                      : 2022:02:11 11:19:53+08:00
[XMP-xmp]       MetadataDate                    : 2022:02:11 11:19:53+08:00
[XMP-xmp]       CreatorTool                     : Adobe Illustrator 25.2 (Windows)

Shouldn't the output be changed to something more readable, like:

[System]
File Permissions: -rw-r-r-

[PDF]
Producer: Adobe PDF library 15.00
Title: 
PageCount: 158

[XMP-dc]
CreateDate: 2022:02:11

The second example doesn't matches the fields from the example above, but you get the idea

  1. Shouldn't the Inspector show all the fields the editor supports and only them? P.S. I understand the value of showing any other metadata so the user can see what's there and decide if they want to clean it. Perhaps, near the end, we can show it as:
[Misc Metadata]
XMPToolkit: Adobe XMP Core 5.6-c017 91.164464, 2020/06/15-10:20:05
CreatorTool: Adobe Illustrator 25.2 (Windows)
nbehrnd commented 2 years ago

@ifohancroft, only about point 2: From the perspective of using the inspector to harvest all of the data, and in a second step to pick the one of interest (either by a spreadsheet, or by an AWK script), I would prefer the three-column approach over the second option because it is easier to stipulate a keyword based identification of a line of interest in AWK and advance a few columns, than the combination 1) search for a line with the keyword, 2) but fetch the information from the next line (or record, in AWK).

However, because the space already separates the first from the second column, it were advantageous if the colon between the second and third column would be dropped altogether. This suggestion is influenced either from AWK (explicit space/tabulator between the columns is the [adjustable] assumption), or the copy-paste of .csv data into spread sheet programs. Then, two patterns come to mind:

a) only remove the colon as column separator

[PDF]           Producer                           Adobe PDF library 15.00
[PDF]           Title
[PDF]           PageCount                          158
[XMP-x]         XMPToolkit                         Adobe XMP Core 5.6-c017 91.164464, 2020/06/15-10:20:05
[XMP-xmp]       ModifyDate                         2022:02:11 11:19:53+08:00
[XMP-xmp]       CreateDate                         2022:02:11 11:19:53+08:00
[XMP-xmp]       MetadataDate                       2022:02:11 11:19:53+08:00
[XMP-xmp]       CreatorTool                        Adobe Illustrator 25.2 (Windows)

b) drop the colon as column separator .and. group items sharing the first entry (e.g., [PDF], [XMP-xmp])

[PDF]           Producer                           Adobe PDF library 15.00
[PDF]           Title
[PDF]           PageCount                          158

[XMP-x]         XMPToolkit                         Adobe XMP Core 5.6-c017 91.164464, 2020/06/15-10:20:05

[XMP-xmp]       ModifyDate                         2022:02:11 11:19:53+08:00
[XMP-xmp]       CreateDate                         2022:02:11 11:19:53+08:00
[XMP-xmp]       MetadataDate                       2022:02:11 11:19:53+08:00
[XMP-xmp]       CreatorTool                        Adobe Illustrator 25.2 (Windows)

My speculation, variant a) were be easier to implement. Variant b) possibly were visually easier to access if there were some convention about the .pdf format stating what parameter (second column) belongs to which group (first column). Maybe there is such an agreement/standard covering this part of the .pdf file format.

ifohancroft commented 1 year ago

@nbehrnd Not sure why I never replied. Sorry. I agree. Your point about making it easier to find what you want with a script by preserving the three column approach, but dropping the colon as a separator and grouping items that share the first entry to also make it easier to visually read makes sense.

I'm not sure I understand what you mean about there being a standard stating which field of second column belong to which field in the first column? I think I do and there is, but I don't think it's needed to implement the sorting. Since the columns are already filled I guess you can just sort by first column to do the grouping.