ChordPro / chordpro

Reference implementation of the ChordPro standard for musical lead sheets.
Other
324 stars 51 forks source link

Minor PDF validity issue (ModDate?) #273

Closed gwyndaf closed 1 year ago

gwyndaf commented 1 year ago

Processing my ChordPro output with ghostscript (to add link/destination annotations) has started reporting a warning, but doesn't seem to impact functionality, so quite minor:

The following warnings were encountered at least once while processing this file:
    invalid operator used in text block

   **** This file had errors that were repaired or ignored.
   **** The file was produced by: 
   **** >>>> PDF::API2 2.044 (linux) <<<<
   **** Please notify the author of the software that produced this
   **** file that it does not conform to Adobe's published PDF
   **** specification.

I did a bit of digging with jhove and rups, and (if I'm using them correctly) it looks like the ModDate might be invalid (looking at PDF::API2 documentation): Screenshot from 2023-03-13 14-04-24

I think ModDate might missing the UTC suffix (+01'00 in my case)

Confirmed with pdfinfo, which reports only CreationDate and not ModDate (but both reported for post-ghostscript version)

I don't know if this arises from ChordPro (6.000_013) or PDF::API2 (2.044), but seems to have started occurring only recently.

sciurius commented 1 year ago

This is what I see:

scrot20230313171131

However, there is a 1-hour difference in the timestamps. Apparently, CreationDate (which is controlled by PDF::API2) is one hour off (ModificationDate is correct, as far as I can see).

% date -Iseconds
2023-03-13T17:18:39+01:00
% chordpro -o ~/tmp/x.pdf ~/tmp/cols.cho
% pdfinfo x.pdf
Title:           Columns
Creator:         ChordPro core 6.000_013 (Unsupported development snapshot) 
Producer:        PDF::API2 2.044 (linux)
CreationDate:    Mon Mar 13 16:18:49 2023 CET
...
sciurius commented 1 year ago

GhostScript should provide a better warning message, or keep silent ;).

gwyndaf commented 1 year ago

I agree: it's a pretty useless warning message!

But it's mildly annoying, so I'll keep looking for the cause, which might be my PDF libraries...

sciurius commented 1 year ago

I can take the blame for CreationDate...

sciurius commented 1 year ago

According to PDF Reference 1.7:

The date format is C<D:YYYYMMDDHHmmSSOHH'mm'>, where C is a static prefix identifying the string as a PDF date. The date may be truncated at any point after the year.

So a missing timezone should not harm and produce no warnings.

sciurius commented 1 year ago

Not related: Did you take a look at https://github.com/ChordPro/chordpro/issues/272? Looking forward to hear your comments.

sciurius commented 1 year ago

Questions to be answered:

  1. How come your ModTime is truncated?
  2. What causes the Ghostscript warnings?
gwyndaf commented 1 year ago
  1. My mistake (I think). I now realise that ModTime isn't actually truncated. I believe the O position in your reference example can be either +, - or Z, and Z seems to be equivalent to +00'00 so possibly doesn't need to be followed by HH'mm.

Possibly better that I focus on 2. and look for a tool/approach that can give me a clearer answer...

sciurius commented 1 year ago

ChordPro and PDF::API2 do not set a ModDate in the metadata. Maybe one of the other tools that you use for PDF manipulation?

gwyndaf commented 1 year ago

Oh that's interesting, thanks.

My test sample was the ChordPro output, before any further PDF processing, but I suppose it's possible the tools I used to examine it might have inferred ModDate from somewhere like file modification date/time. I'll investigate further over the next few days.

sciurius commented 1 year ago

I know that rups sets the ModDate to the time it opens the document.

gwyndaf commented 1 year ago
  1. I've done a bit of testing and it seems like date could be a 'red herring.' The Ghostscript warning doesn't occur with a trivial one-song example, even though that surely has the same format for PDF metadata as the full songbook which leads to the gs warning message.

  2. In terms of tracking down the actual cause of the warning, a full songbook generated with the -X default config doesn't cause gs any trouble, so it seems like something that my config has introduced. Just need to narrow that down...

gwyndaf commented 1 year ago

So it seems like this line in pdf.chorus.recall is leading to output that troubles GhostScript:

"tag"   : "<span face=sans foreground=#444488 background=#FFFFFF><span face=mono>\u2191</span> REPEAT CHORUS <span face=mono>\u2191</span></span>",

Specifically, it seem to be setting background=#FFFFFF and removing that part makes the gs warning stop, but I've no idea why.

Next step is to find a way to specify a white (or none) background without it troubling gs!

sciurius commented 1 year ago
  1. I've done a bit of testing and it seems like date could be a 'red herring.

Good. From studying the miscellaneous standards

There seems some general consensus to allow, but not enforce, the trailing apostrophe so it is very unlikely that any tool would complain.

sciurius commented 1 year ago

Okay, now we're getting somewhere. This is minimal to reproduce the GS warning:

{title:Minimal}
<span background=#FFFFFF>REPEAT CHORUS</span>

The background is explicitly painted by Text::Layout as a white rectangle. I must investigate further but it may be that this rectangle drawing is not allowed in text mode. EDIT: It is not but noone complained until now and every renderer does it ok.

Can you leave out the background setting? You can silence Ghostscript with -quiet. EDIT: Or force the 'old' PDF engine with -dNEWPDF=false.

gwyndaf commented 1 year ago

Yes, it seems like Ghostscript has a general problem with the result of background markup (but not with backgrounds for whole lines).

Thanks for the gs workarounds. I hope my use of gs is only temporary (to add/restore links/annotations), so it probably makes sense to silence it.

I did explore workarounds within ChordPro, though. Visually, I want my chorus tags more prominent than other comments, e.g. increasing contrast by removing background shading, but I can achieve that by setting chorus.recall.type to comment_box (which isn't shaded) and that works nicely.

Incidentally, I tried setting chorus.recall.type to text but it gave an error: Invalid value for pdf.chorus.recall.type. Are permitted values limited to comment types, or should it accept any of the predefined font styles?

sciurius commented 1 year ago

Yes, it seems like Ghostscript has a general problem with the result of background markup (but not with backgrounds for whole lines).

I'm not sure Ghostscript is doing wrong here. The docs are not clear enough (for me) to be 100% sure that graphic operations are disallowed in text context.

... and that works nicely.

Great!

Incidentally, I tried setting chorus.recall.type to text but it gave an error: Invalid value for pdf.chorus.recall.type. Are permitted values limited to comment types, or should it accept any of the predefined font styles?

Yes, it is a comment type, not a (font) style. I'll probably add some user-defined comments so you have more freedom in selecting styles.

gwyndaf commented 1 year ago

Ah OK. Even if a the graphic box is strictly permitted in that context, I'm inclined to avoid drawing a background, as I've now found it causes interesting visual effects (colours exaggerated for clarity): Screenshot from 2023-03-15 17-12-33

  1. Because the background is drawn around characters, its height seems to reflect the relevant font's line height, so the text background is shallower than the arrows background (which use a different font), leaving some of the underlying comment background (yellow) still visible below the text.

  2. Increasing type size (size=120%) also increase the background height, but doesn't affect line spacing, so the background cuts off the bottom of the line above.

I think those behaviours make sense, and seem an natural effect of where line operations and character operations overlap, but a good reason for me to be careful when adding character backgrounds!

User-defined comments seem like a good idea. I've essentially 'used up' the variations available, with cb for musical footnotes and ci for background footnotes, along with c for comments within a song, but which could range from verse labels (which are purely for information) to essential instructions, like 'repeat chorus' and it'd be nice to differentiate visually between them.

On the other hand, I suppose custom comment styles might imply new directives, which could dilute the value of Chordpro as a standard, and the portability of song sources. Still, if the feature's there, people can choose between the 'portability' and 'design control' routes.

That reminded me of something similar I'd been considering, but probably merits a separate feature request...

sciurius commented 1 year ago

Yes, drawing backgrounds is a very disappointing thing to do, for the reasons you already found out, and more. And you haven't yet encountered fonts that do, or do not, include natural spacing in their dimensions. Backgrounds are basically only there because the old Chord program had a grey background in comments.