aspose-words / Aspose.Words-for-Java

Aspose.Words for Java examples, plugins and showcases
https://products.aspose.com/words/java
MIT License
401 stars 206 forks source link

Word to PDF function, some rare words are missing(Chinese) #74

Closed Ccjiawei closed 2 years ago

Ccjiawei commented 3 years ago

When word typesetting is full of many rare words (normal display in word), some rare words will lose the display blank when wrapping. Two fonts, Song typeface and founder typeface, were used, and the test results were the same.

The test has found many times that what is missing is not a specific word, but according to the different typesetting of the word. For example, the order of the generated word changes. After each conversion, the generated PDF file will be missing different words. The loss is the same when tested with Song typeface and founder font, and other words are missing when tested with Microsoft YaHei.

Use WPS software to test the conversion (at this time, the word typesetting font has been preset in Song typeface or founder). The converted PDF displays normally without word loss

[the corresponding font directory has been specified in the program, and the default font when the program cannot find the corresponding font has also been set, but the conversion of rare words still has the problem of missing words. Is this a current bug? Or what caused it? Please answer them.

Some codes are as follows:

` /**

//Conversion statement Document doc = null; FileOutputStream fos = new FileOutputStream(file); doc.save(fos, SaveFormat.DOCX); `

生僻字整体排版如下: 𪾢𫓯𬷕𫷷亍𪾢尢𪾢彳𫓯卬𫓯殳𬷕𠙶𬷕毌𫷷邘𫷷戋圢氕伋仝冮氿汈氾忉宄𬣙讱扞圲圫芏芃朳朸𨙸邨吒吖屼屾辿钆仳伣伈癿甪邠犴冱邡闫𬇕汋䜣讻𬣞孖𬘓纩玒玓玘玚刬𫭟坜坉扽𫭢坋扺㧑毐芰芣苊苉芘芴芠𫇭芤杕杙杄杧杩尪尨轪𫐄坒芈旴旵呙㕮岍𫵷岠岜呇冏觃岙伾㑇伭佖伲佁飏狃闶汧汫𣲘𣲗沄沘𬇙汭㳇沇忮忳忺𬣡祃诇邲诎诐屃𫸩岊阽䢺阼妧妘𨚕纮驲𫘜纻𬘘𫘝纼玤玞玱玟邽邿坥坰坬坽弆耵䢼𦭜茋苧苾苠枅㭎枘枍矼矻匼𬨂𬀩𬀪旿昇昄昒昈咉咇咍岵岽岨岞峂㟃囷𬬩钐钔钖牥佴垈侁侹佸佺隹㑊侂佽侘郈舠郐郃攽肭肸肷狉狝饳忞於炌炆泙沺泂泜泃泇怊峃穸祋祊𫍣𬣳𬩽鸤弢弨陑𬮿陎𬯀卺乸妭姈𫰛迳叕𬳵驵𬳶䌹驺𫠊绋绐砉耔㛃玶珇珅𬍛珋玹珌玿韨垚垯垙垲埏垍耇鿍垎垴垟垞挓垵垏拶荖荁荙荛茈茽荄茺𬜬荓茳𦰡茛荭㭕柷柃柊枹栐柖郚剅䴓迺厖砆砑砄耏奓䶮轵轷轹轺昺𪾢昽盷咡咺昳昣哒昤昫昡咥昪虷虸哃峘耑峛𪨰峗峧帡钘𫓧钜𬬮𬬱𬬭钪钬钭矧秬俫舁俜俙俍垕衎舣弇侴鸧䏡胠𦙶胈胩胣朏飐訄饻庤疢炣炟㶲洭洘洓洿㳚泚浈浉洸洑洢洈洚洺洨浐㳘洴洣恔宬窀扂袆祏祐祕叚陧陞娀姞姱姤姶姽枲绖骃𬘡𬳽𬘩𫄧彖骉恝珪珛珹琊玼珖𪟝珽珦珫珒𬍤珢珕珝𫭼埗垾垺埆垿埌埇莰茝𬜯鄀莶莝䓖莙栻桠𬂩桄梠栴梴栒酎酏𫠆砵砠砫砬硁恧翃郪𨐈辀辁𬌗剕赀哢晅晊唝哳哱冔晔晐晖畖蚄蚆𫑡帱崁峿𪨶崄帨崀赆𬬸钷𬬻𬬹𬬿𬭁眚甡笫倻倴脩倮倕倞𫢸倓倧衃虒舭舯舥瓞鬯鸰脎朓胲虓鱽狴峱狻眢𫗧勍痄疰痃竘羖羓桊敉烠烔烶烻𬊈涍浡浭浬涄涢涐浰浟浛浼浲涘悈悃悢𬒈宧窅窊窎扅扆袪袗袯祧隺堲疍𨺙陴烝砮㛚哿翀翂剟𬳿𫄨绤骍𬘫䂮琎珸珵琄琈琀珺掭堎堐埼掎埫堌晢𫮃掞埪壸㙍聍菝萚菥莿䓫勚䓬萆菂菍菼萣䓨菉䓛梼梽桲梾桯梣梌桹敔厣硔鿎硙硚硊硍勔䴕龁逴唪啫翈㫰晙畤𬱖趼跂蛃蚲𬟽蚺啴䎃崧崟崞崒崌崡铏𫓯𫟹铕𫟼铖铘铚铞铥铴牻牿稆笱笯偰偡鸺偭偲偁㿠鄅偓徛衒舳舲鸼悆鄃瓻䝙脶脞脟䏲鱾猇猊猄觖𠅤庱庼庳痓䴔竫堃阌羝羕焆烺焌淏𬇹淟淜淴淯湴涴𬍡㥄惛惔悰惙寁逭𬤇𫍯袼裈祲𬤊𫍲谞艴弸弶𬯎隃婞娵婼媖婳婍婌婫婤婘婠𬘬𬘭𬴂𫘦绹𫟅𬘯骕𫘧絜珷琲琡琟琔琭堾堼揕㙘堧喆堨塅堠絷𪣻𡎚葜惎萳葙靬葴蒇蒈鄚蒉蓇萩蒐葰葎鄑蒎葖蒄萹棤棽棫椓椑𬃊鹀椆棓棬棪椀楗𬷕甦酦觌奡皕硪欹詟𫐐辌棐龂𬹼黹牚睎晫晪晱𧿹蛑畯斝喤崶嵁𫶇崾嵅崿嵚翙𫖮圌圐赑淼赒鿏铹𬭊铽𨱇𫓶锊锍锎𬭎锓犇颋稌筀筘筜筥筅傃傉翛傒傕舾畬𫖯脿腘䐃腙腒𬱟鲃猰𫛭猯㺄馉凓鄗𫷷廋廆鄌粢遆旐𬮱焞𬊤欻𣸣溚溁湝渰湓㴔渟溠渼溇湣湑溞愐愃敩甯棨扊裣祼婻媆媞㛹媓媂媄毵矞𬴃𫘨缊缐骙瑃瑓瑅瑆䴖瑖瑝瑔瑀𤧛瑳瑂嶅瑑遘髢塥堽赪摛塝搒搌蒱蒨蓏蔀蓢蓂蒻蓣椹楪榃榅楒楞楩榇椸楙歅𬪩碃碏𬒔碈䃅硿鄠辒𬨎𫐓龆觜䣘暕鹍𫫇㬊暅跱蜐蜎嵲赗骱锖𫓹锘锳锧锪𬭚锫锬𬭛稑稙䅟𬕂筻筼筶筦筤傺鹎僇艅艉谼貆腽腨腯鲉鲊鲌䲟𬶋𬶍鲏雊猺飔觟𦝼馌裛廒瘀瘅鄘鹒鄜麀鄣阘𫔶煁煃煴煋煟煓滠溍溹滆滉溦溵漷滧滘滍愭慥慆塱𫌀裼禋禔禘禒谫鹔𫖳愍嫄媱戤勠戣𫘪𫘬缞耤瑧𫞩瑨瑱瑷瑢斠摏墕墈墐墘摴銎𡐓墚撖𪤗靽鞁蔌蔈蓰蔹蔊嘏榰榑槚𣗋槜榍疐𬸘酺酾酲酴碶䃎𬒗碨𥔲碹碥劂𫚖䴗夥瞍鹖㬎跽蜾幖嶍圙𨱏锺锼锽𬭤锾锿镃镄镅馝鹙箨箖劄僬僦僔僎槃㙦鲒鲕𫚕鲖鲗鲘鲙𬶐𬶏𩽾夐獍飗𬸚凘廑廙瘗瘥瘕鲝鄫熇漹漖潆漤潩漼漴㽏漈漋漻慬窬窭㮾𬤝褕禛禚隩嫕嫭嫜嫪𬙂㻬麹璆漦叇墣墦墡劐薁蕰蔃鼒槱鹝磏磉殣慭霅暵暲暶踦踣䗖蝘蝲蝤噇噂噀罶嶲嶓㠇嶟嶒镆镈镋镎𬭩镕稹儇皞皛䴘艎艏鹟𩾃鲦鲪鲬橥觭鹠鹡糇糈翦鹢鹣熛潖潵㵐澂澛瑬潽潾潏憭憕𬸣戭褯禤𫍽嫽遹𬴊璥璲璒憙擐鄹薳鞔黇𬞟蕗薢蕹橞橑橦醑觱磡𥕢磜豮𫟦𬺈𫠜鹾虤暿曌曈㬚蹅踶䗛螗疁㠓幪𪩘嶦𬭬𨱑𬭯馞穄篚篯簉鼽衠盦螣縢鲭鲯鲰鲺鲹𫗴亸癀瘭𬸦羱糒燋熻燊燚燏濩濋澪澽澴澭澼憷憺懔黉嬛鹨翯𫄷璱𤩽璬璮髽擿薿薸檑櫆檞醨繄磹磻瞫瞵蹐蟏㘎𬭳镤𬭶𫔍镥镨𬭸𨱔𬭼𫔎矰穙穜穟簕簃簏儦魋斶艚𬸪谿䲠𬶟鲾𬶠鲿鳁鳂鳈鳉獯䗪馘襕襚𬶨螱甓嬬嬥𦈡𫄸瓀釐鬶爇鞳鞮𬟁藟藦藨鹲檫黡礞礌𥖨蹢蹜蟫䗴嚚髃镮镱酂馧簠簝簰鼫鼩皦臑䲢鳑鳒鹱鹯癗𦒍旞翷冁䎖瀔瀍瀌襜䴙𬙊嚭㰀鬷醭蹯蠋翾鳘儳儴鼗𬶭𩾌鳚鳛麑麖蠃彟嬿鬒蘘欂醵颥甗𨟠巇酅髎犨𬶮𨭉㸌爔瀱瀹瀼瀵襫孅骦𬙋耰𤫉瓖鬘趯𬺓罍鼱鳠鳡鳣爟爚灈韂糵蘼礵鹴躔皭龢鳤亹籥鼷𫚭玃醾齇觿蠼

AlexNosk commented 3 years ago

@Ccjiawei Could you please attach the problematic document here for testing and provide the fonts used in the document. We will check the issue and provide you more information.

Ccjiawei commented 3 years ago

a.docx These are rare word documents

fz.zip And these are rare word documents

Please help test whether there is a problem in converting word to PDF

Below is my own program test conversion PDF document: java2pdf_fz.pdf

AlexNosk commented 3 years ago

@Ccjiawei Thank you for additional information. I have tested the scenario on my side using the latest 21.8 version of Aspose.Words and did not manage to reproduce the problem. Please see the attached document generated using the latest version of Aspose.Words. out.pdf

I used the following simple code to convert the document to PDF:

FontSettings fontSettings = new FontSettings();

fontSettings.setFontsSources(
    new FontSourceBase[] {
    new FolderFontSource("C:\\Temp\\fonts", true, 0),
    new SystemFontSource(1)
    }
);

Document outputDoc = new Document("C:\\Temp\\in.docx");
outputDoc.setFontSettings(fontSettings);
outputDoc.save("C:\\Temp\\out.pdf");

Could you please check with the latest version of Aspose.Words on your side and let us know how it goes? If the problem still persists, please attach the problematic output here for our reference.

Ccjiawei commented 3 years ago

The version I use here is 19.5. The problem is the same as the latest version on your side. The problem text in the document is Chinese.

Then there is a problem in the document you give (there will be a few words missing in the document). I'll give you a screenshot in the compressed package and circle it.

Finally, I'll send you the Song typeface. Both font conversion are the same problem, so I wonder if there is a bug in the program

Please check and solve it, Thank you very much for your reply and support problem.zip

AlexNosk commented 3 years ago

@Ccjiawei excuse me for misunderstanding. I have managed to reproduce the problem. To resolve the problem you should use text shaping factory. Please see the following code and the attached document produced on my side:

Document outputDoc = new Document("C:\\Temp\\in.docx");
outputDoc.setFontSettings(fontSettings);
outputDoc.getLayoutOptions().setTextShaperFactory(HarfBuzzTextShaperFactory.getInstance());
outputDoc.save("C:\\Temp\\out.pdf");

out.pdf Also see the following link for more information: https://docs.aspose.com/words/java/enable-opentype-features/

Ccjiawei commented 3 years ago

Thank you very much for your answer. After using your method, I have successfully solved this problem that has plagued me for a long time. Thank you again for your support.

Ccjiawei commented 3 years ago

Last question, does the font setting support stream parameter (IO) when converting document to PDF

AlexNosk commented 3 years ago

@Ccjiawei Sure, you can use StreamFontSource to achieve this. Please follow the link for more information https://apireference.aspose.com/words/java/com.aspose.words/streamfontsource

Ccjiawei commented 3 years ago

@AlexNosk Hello, according to the information you provided above, I have updated the version of aspose.words for Java to 20.12, and can use the following code to operate textshaperfactory: doc.getLayoutOptions().setTextShaperFactory(HarfBuzzTextShaperFactory.getInstance()); However, after the upgrade, when the windows environment is converted normally and placed in the Linux environment, the following exceptions will occur: `1. Caused by: java.lang.UnsatisfiedLinkError: com.aspose.words.shaping.harfbuzz.HB.hb buffer set_ flags(JI)V

  1. Aspose.Words native libs cannot be loaded. /tmp/AsposeNative/Shaping.Harfbuzz/1630119775767/libharfbuzz-shaping-engine-dll.so: /lib64/libstdc++.so.6: version CXXABI_/ AsposeNative/Shaping.Harfbuzz/1630119775767/libharfbuzz-shaping-engine-dll.so)

consult relevant materials, and there are relevant libharfbuzz-shaping-engine-dll.so and / lib64 / libstdc + +. So. 6 in Linux environment.

My Linux version is CentOS Linux release 7.6.1810

Does the aspose.words.shaping.harfbuzz plug-in in aspose.words for Java not support linux environment at present? Or something else? Please answer, thank you very much.

AlexNosk commented 3 years ago

@Ccjiawei To make it work on Linux you should install harfbuzz package. Here you can find the package for CentOS https://centos.pkgs.org/8/centos-appstream-x86_64/harfbuzz-1.7.5-3.el8.x86_64.rpm.html

KonstantinSidorenko commented 3 years ago

Hi @Ccjiawei, Aspose.Words for Java supports external Shaping.Harfbuzz library only on Windows (you can see dll in the path strings). But Java supports Harfbuzz natively starting from Java 9. Please upgrade to the latest Java version and try your test sample again.