Closed Rajce007 closed 6 months ago
Sorry, I have used non-latin characters many times for paths in the manifest with no issues at all. So something else is going on that is leading to these manifest items being determined to be missing.
Did Indesign properly url encode the paths as dictated by the epub specification?
Did Indesign properly normalize the unicode before encoding the file paths to utf-8 strings before they url encoded them?
So please attach an exact screen capture of the message about files missing from the manifest.
Next, unzip your Indesign epub and provide a detailed listing of the actual paths to those files so a byte by byte comparison can be made.
Finally copy and paste the Indesign generated OPF (before loading it into Sigil) so that it can be checked to be properly generated.
And please copy and paste here the original Indesign generated OPF file, so I can try see exactly which unicode codepoints are involved.
One other potential cause is that the zip library that creates the .epub does not properly set the flag bit that indicates that the zip file name has been utf-8 encoded.
That is a common problem on Windows based systems unlike Linux and MacOS which tend to include/use the official zlib.
So please take your InDesign epub and strip it down to a single chapter with one of the problem files names and then replace that chapter contents with nonsense and save the epub and attach it here. That way nothing copyrighted is revealed and I have a testcase to test with on my own macOS dev box to track down what is happening.
Hello Kevin,
here are files
error sreenshot
epub3.0 directly from indesign indd_original_name_characters_ebook_before_Sigil.epub.zip
and same epub after first save in Sigil indd_original_name_characters_ebook_SigilSAVE.epub.zip
screenshot with list of files inside
(files are inside, but not accesible for manifest)
I dont know how unzip epub file on my mac - i tried change extension from .epub to .zip, but it not works...
Tom
Thank you for the test cases. Yes, to open an epub manually (since an epub is just a specially constructed zip file) you just rename a copy of it from .epub to .zip.
Then to force unzipping it I use the command line unzip tool in Terminal.app Assuming your renamed epub is test.zip here are the steps.
create a folder on your Desktop to unpack the zip inside up (to make it easy to delete afterwards, call it "mytest"
copy the test.zip and put it inside your newly created "mytest" folder
Open Terminal.app and use the following commands entered one per line followed by a return
cd cd ~/Desktop/mytest unzip test.zip exit
Inside the mytest folder on your Desktop, you will find the unpacked zip
But I can work with what you sent.
One question:
Did you create this InDesign epub on your mac or on some other Windows platform?
Okay, I took your indd_original_name_characters_ebook_before_Sigil.epub.zip and manually unzipped it to get the .epub back.
Then I opened Sigil with it. That epub is missing the xml document headers which Sigil automatically fixed and then it opened with no missing manifest files found at all. I was able to access every single file. I am on a macOS system with the older HFS+ case sensitive file system.
Is your mac by chance using the newer mac APFS file system?
The older HFS+ filesystem automatically created files names with Unicode Normalization NFD (with some minor variations based on the older Unicode standard). The newer APFS file systems no longer does Unicode NFD normalization.
The epub's OPF should be using NFC normalized utf-8 strings.
Hello Kevin,ad One question:Did you create this InDesign epub on your mac or on some other Windows platform?The Indesign ePub file was created directly on the same Mac, where I used the Sigil Indesign is in actual Adobe Cc2024 version Tom5. 5. 2024 v 16:25, Kevin Hendricks @.***>: Thank you for the test cases. Yes, to open an epub manually (since an epub is just a specially constructed zip file) you just rename a copy of it from .epub to .zip. Then to force unzipping it I use the command line unzip tool in Terminal.app Assuming your renamed epub is test.zip here are the steps.
create a folder on your Desktop to unpack the zip inside up (to make it easy to delete afterwards, call it "mytest"
copy the test.zip and put it inside your newly created "mytest" folder
Open Terminal.app and use the following commands entered one per line followed by a return
cd cd ~/Desktop/mytest unzip -r test.zip exit Inside the mytest folder on your Desktop, you will find the unpacked zip But I can work with what you sent. One question: Did you create this InDesign epub on your mac or on some other Windows platform?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>
About MacOs file system. Yes, I bought new one MacBook Pro, with M3 Pro CPU, which probably can’t use other file system for working in Adobe CC. Tom
Interestingly, I did received the manifest missing error message from the SigilSAVE version of that epub.
Am I confused by the naming? I thought the direct from InDesign one was the one with the issues, but instead it appears to be the SigilSAVE one that shows the errors.
Are the names messed up or did the errors only happen after saving the file in Sigil?
The errors only happen after saving the file in Sigil. So first open of this epub from indesign in Sigil was without manifest error.
Okay, that was not something I understood earlier.
The bottom line is something on the macOS side is encoding the manifest path hrefs as either Unicode Normalized to NFC form while the Zip's (the .epub) file entries are being Unicode Normalized to NFD form (or visa versa). So although the strings will appear to be exact matches, the actual byte order is different as one is decomposing some of the characters into base char and accent, while the others are using the composed single character form.
That causes the mismatch leading to the Missing Manifest errors.
In Normalizing Unicode strings, the mac in this case is ass backwards. It used to force everything to be Decomposed (but with special older Unicode rules). But the rest of the entire world including the web assumes that real Unicode Strings are in NFC (composed) form. Linux just assumes it is a byte sequence and does not really care which is bad too.
I have been reading the specs on the new APFS file sytem and it appears to no longer force things to the mac"s preferred modified NFD normalization form. So now you can have mixes where some strings can be in NFC form and some other strings can be in NFD form (when it comes to urls and paths).
That was probably a bad idea by Apple to make a change like that.
So I will need to fight with this a bit to see how best to force the file paths to use the same way of normalizing Unicode.
Upon further testing, the manifest entries produced when Sigil wrote the epub used the mac NFD variant while the zip container used the NFC variant. That caused the missing manifest error. The exact reverse could also possibly happen but I am unsure as I do not have a test for that case.
I will modify the macOS Export Sigil code to make sure the manifest entries are all normalized to NFC form. Hopefully that will prevent errors of this sort. This problem only exists on macOS platforms.
I guess that is why the epub people recommend sticking to ascii for file names as the number of different file systems used by all the e-readers plus the 3 major platforms is so huge and not all normalize the unicode strings in the same way.
It seems that Zip archive internal file names have no standard unicode normalization specification which is sheer madness.
So it seems we must force everything inside Sigil to Unicode NormalizationForm C given the zip container (.epub) could have been created on any type of platform and use any form of Unicode Normalization it wants to as well.
I have pushed a tentative fix for this to master. But it really needs to be tested heavily before the next release.
Thank you for your bug report and test cases. I will leave this issue open until a fully tested fix is in place and has been made.
I have been testing this on both my arm64 Mac Studio and my i7 MacBookPro and these changes seem to work and not cause any unpleasant side effects that I can detect.
So closing this issue as fixed.
Bug Description
Sigil non support saving Unicode file path for Manifest
Some files everytime give me error message "Manifest error" message after first save edited epub in Sigil.
The symptoms are same like in https://github.com/Sigil-Ebook/Sigil/issues/448
A found that this Sigil Errors make me if original Adobe InDesign file was named with non-latin characters.
Workaround that worked for me:
Rename original indesign file without non-latin characters. After them is file in Sigil reopened correctly.
Platform (OS)
macOS
OS Version / Specifics
Sonoma 14.4.1 arm64 version
What version of Sigil are you using?
Sigil.app-2.1.0-Mac-arm64
Any backtraces or crash reports
No response