benchen71 / epub-metadata-editor

Edit the metadata of EPUB files
192 stars 11 forks source link

Apostrophe becomes ' #81

Closed X-Xadro closed 4 months ago

X-Xadro commented 1 year ago

Whenever you change the series field with a word containing a apostrophe ( ' ) it changes into '

This is luckily, as far as i can tell, only in the Series field, if you do this in the Title field it picks the apostrophe up like it should.

Just found out that the ( ’ ) does work as intended just not the ( ' )

benchen71 commented 1 year ago

I can confirm that this was intentional. All fields in the OPF file follow the specifications (see https://www.dublincore.org/specifications/dublin-core/dcmes-xml/). If you view the OPF file (using the button in the Advanced Tasks panel), for an EPUB with an apostrophe in the book title, you will see that the apostrophe is encoded as ' in the title also. It's just that whatever EPUB viewing software you are using correctly parses this string to show you the single character.

However, the "series" field is not a standard part of the specifications (I'm just following how Calibre implemented series). So it is arguable that my program should not follow the specifications for encoding special characters.

I don't know what is the right thing to do in this case. At this point, I am not going to change the program, unless someone can provide further guidance on the matter. I will close this issue in a month if no guidance is forthcoming.

benchen71 commented 12 months ago

Since there has been no further discussion, I am closing this issue.

crimsonidol commented 5 months ago

I'm not entirely sure if it's the same as X-Xadro asked for but my I noticed a similar problem.

When I edit an epub to add the series any special XML-character that needs to get escaped gets properly escaped when saved into the opf-file but when reopening the file, the value for series doesn't get unescaped, e.g. ' doesn't turn into ' but is displays as ' inside the series-field. And when not noticing it, it further gets escaped as &amp' when saving after performing another action. E.g. for one file it looks like this for me: unescaped_apostrophe

However, the "series" field is not a standard part of the specifications (I'm just following how Calibre implemented series). So it is arguable that my program should not follow the specifications for encoding special characters.

The way you do it with the element belongs-to-collection is part of the standard, so everything's fine the way you write it to the file: https://www.w3.org/TR/epub-33/#sec-belongs-to-collection https://github.com/w3c/epub-specs/issues/1356

benchen71 commented 5 months ago

Hmm, I might need to check this out. Maybe the best answer is simply not to use the OPF specification for that field.

crimsonidol commented 5 months ago

I think like there's a little misunderstanding. Using that field is fine but what's missing is when reading the field that it should be treated as an XMLInput: https://github.com/benchen71/epub-metadata-editor/blob/386e87730da9a2c655ecd20c0502223effad505b/epubmetadataeditor/Form1.vb#L868

For calibre:series and other fields (like dc:author) the assigned value looks like this: https://github.com/benchen71/epub-metadata-editor/blob/386e87730da9a2c655ecd20c0502223effad505b/epubmetadataeditor/Form1.vb#L890

benchen71 commented 5 months ago

Yes, it looks like you've found a bug! I'll try and get a new version out soon...

benchen71 commented 4 months ago

Hopefully fixed in 1.9.7.