Open echan00 opened 5 years ago
if it helps here is info about the fonts in the pdf
name type emb sub uni prob object ID
------------------------------------ ----------------- --- --- --- ---- ---------
ALPMFJ+Times-Roman Type 1C yes yes yes 243 0
ALPMJK+Times-Italic Type 1C yes yes yes 246 0
MHei-Bold-ETen-B5-H-Identity-H CID Type 0C yes no no 249 0
MSung-Light-ETen-B5-H-Identity-H CID Type 0C yes no no 252 0
Microsoft JhengHei,Bold-ETen-B5-H-Identity-H CID Type 0C yes no no X 857 0
Microsoft JhengHei-ETen-B5-H-Identity-H CID Type 0C yes no no X 852 0
AMACHH+TimesNewRomanPSMT Type 1C yes yes no 866 0
AMACNG+ArialMT Type 1C yes yes no 861 0
AMADCF+TimesNewRomanPS-BoldMT Type 1C yes yes no 859 0
PMingLiU-ETen-B5-H-Identity-H CID Type 0C yes no no X 865 0
AMADJE+TimesNewRomanPS-ItalicMT Type 1C yes yes no 868 0
MS Mincho-KSCms-UHC-H-Identity-H CID Type 0C yes no no X 869 0
AMCGMO+Calibri,Italic Type 1C yes yes yes 872 0
AMCPNE+Calibri,Bold Type 1C yes yes yes 873 0
Microsoft YaHei,Bold-GBK-EUC-H-Identity-H CID Type 0C yes no no X 876 0
Microsoft YaHei-GBK-EUC-H-Identity-H CID Type 0C yes no no X 877 0
AMDONF+Calibri Type 1C yes yes yes 878 0
AMDPFD+Symbol Type 1C yes yes no X 879 0
AMEAHN+Arial Type 1C yes yes yes 882 0
SimSun-GBK-EUC-H-Identity-H CID Type 0C yes no no X 887 0
AMECKA+Wingdings Type 1C yes yes no 888 0
MingLiU-ETen-B5-H-Identity-H CID Type 0C yes no no X 892 0
I've been using the pdf2txt tool to convert many PDFs in English and Chinese to TXT format. A bunch of files are not working as expected:
Here is the PDF file to be converted: 0.pdf Here is the resulting TXT file: 0.txt
I would be super grateful if someone could tell me what is wrong or point me in the direction towards a fix.