adams85 / po

A library for reading and writing Gettext PO files.
MIT License
61 stars 16 forks source link

Ensure validity of previous value comments #23

Closed samhocevar closed 1 year ago

samhocevar commented 1 year ago

Here is another seemingly undocumented GNU gettext rule that requires fixing: previous value comments must obey a very specific order (msgctxt, msgid, msgid_plural), and no other header is allowed to appear.

New checks have been added to the unit tests.

For the record, here are two examples of invalid .po files that Karambolo.PO may create in some circumstances:

#| msgid "oldyo"
#| msgctxt "INVALID (wrong order)"
msgid "yo"
msgstr "yo!"
#| msgctxt "old context"
#| msgid "oldyo"
#| msgid "INVALID (duplicate entry)"
msgid "yo"
msgstr "yo!"

The GNU gettext utility msgcat is a very convenient tool to check the validity of these files:

% msgcat invalid.po
invalid.po:3:9: syntax error
msgcat: found 1 fatal error
%
adams85 commented 1 year ago

Nice catch again! However, I'm not completely sure that we should handle this exactly in the way you proposed.

First of all, I'm not sure whose responsibility this should be. I mean IPOEntry.Comments is an ordered list, so consumers are free to make sure that the comments in that list obey the GNU gettext rules. Of course, the generator can enforce them as well but there's a problem with this: the generator doesn't report problems currently, which means that consumers could feed invalid input to the generator without receiving feedback.

As for fixing the order this is probably ok. However, filtering out duplicate or invalid comments seems problematic. For example, how to decide which comment to ignore in the case of duplicates? Obviously, only the consumer can resolve such situations.

So, apart from ordering, this looks like a problem which should be handled by some kind of validation logic. Whether it should be baked into the data model or should be implemented as a separate step needs some further consideration and probably won't happen before v2, where I plan to overhaul the data model anyway, according to the MS recommendations.

Could you update the PR in the light of the above? I mean, let's deal only with the order in the generator.

adams85 commented 1 year ago

Merging this but will strip out changes apart from ensuring ordering.