RunasSudo / gfx2gfx-pdftext

A fork of SWFTools' gfx2gfx which preserves text, rather than converting to shapes.
GNU General Public License v2.0
9 stars 5 forks source link

convert is not totally right #10

Open kumakichi opened 6 years ago

kumakichi commented 6 years ago

thanks for your fancy project, i'm not sure whether you can read Chinese or not(i got problems while using this tool converting some swf in Chinese),actually, i came across 2 problems:

version: gfx2gfx-pdf2text - part of swftools 0.9.2 (build 8d5a70b)

the 2nd problem is easy to fix(i don't know whether this fix is right or not), i just add 2 lines code after https://github.com/RunasSudo/gfx2gfx-pdftext/blob/8d5a70b1d8526b7d596b9675a23386636a5a3b35/lib/devices/pdf.c#L392

        if(gt7bits>=128)
            gt7bits=0;

but there still exists the 1st problem, some character is missing after convertion

Iey4iej3 commented 4 years ago

If I understand correctly, if there are too many characters, the conversion is not guaranteed. I believe that this is mostly due to the limitation of the hack of pdflib-lite in question. If we want to find a perfect solution, seemingly we need to switch to another lib (maybe the lib in engines like pdflatex works?). But note that swf is dead, I fear that nobody will be interested in writing such a program.

nissansz commented 4 years ago

thanks for your fancy project, i'm not sure whether you can read Chinese or not(i got problems while using this tool converting some swf in Chinese),actually, i came across 2 problems:

version: gfx2gfx-pdf2text - part of swftools 0.9.2 (build 8d5a70b)

the 2nd problem is easy to fix(i don't know whether this fix is right or not), i just add 2 lines code after

https://github.com/RunasSudo/gfx2gfx-pdftext/blob/8d5a70b1d8526b7d596b9675a23386636a5a3b35/lib/devices/pdf.c#L392

        if(gt7bits>=128)
            gt7bits=0;

but there still exists the 1st problem, some character is missing after convertion

I tried the code. And some characters are wrong, and reverted. And cannot copy text from pdf, how to set font for conversion? Please send to steve8000818@gmail.com if there is solution.

Iey4iej3 commented 4 years ago

thanks for your fancy project, i'm not sure whether you can read Chinese or not(i got problems while using this tool converting some swf in Chinese),actually, i came across 2 problems: version: gfx2gfx-pdf2text - part of swftools 0.9.2 (build 8d5a70b)

the 2nd problem is easy to fix(i don't know whether this fix is right or not), i just add 2 lines code after https://github.com/RunasSudo/gfx2gfx-pdftext/blob/8d5a70b1d8526b7d596b9675a23386636a5a3b35/lib/devices/pdf.c#L392

        if(gt7bits>=128)
            gt7bits=0;

but there still exists the 1st problem, some character is missing after convertion

I tried the code. And some characters are wrong, and reverted. And cannot copy text from pdf, how to set font for conversion? Please send to steve8000818@gmail.com if there is solution.

I believe that it is almost impossible to resolve. Essentially it is the restriction in pdflib-lite, and the goal of this hacking seems temporarily solving the problem if there are not so many non-ASCII characters, and therefore it does not work properly for languages with a huge character set (like Chinese), gt7bits>=128 seems to be exactly the case when there are too many characters. You can see this from the comments in the code.

I opened an issue at https://github.com/matthiaskramm/swftools/issues/68. However, I am not sure whether there is somebody interested in writing codes for this.

nissansz commented 4 years ago

thanks for your fancy project, i'm not sure whether you can read Chinese or not(i got problems while using this tool converting some swf in Chinese),actually, i came across 2 problems: version: gfx2gfx-pdf2text - part of swftools 0.9.2 (build 8d5a70b)

the 2nd problem is easy to fix(i don't know whether this fix is right or not), i just add 2 lines code after https://github.com/RunasSudo/gfx2gfx-pdftext/blob/8d5a70b1d8526b7d596b9675a23386636a5a3b35/lib/devices/pdf.c#L392

        if(gt7bits>=128)
            gt7bits=0;

but there still exists the 1st problem, some character is missing after convertion

I tried the code. And some characters are wrong, and reverted. And cannot copy text from pdf, how to set font for conversion? Please send to steve8000818@gmail.com if there is solution.

I believe that it is almost impossible to resolve. Essentially it is the restriction in pdflib-lite, and the goal of this hacking seems temporarily solving the problem if there are not so many non-ASCII characters, and therefore it does not work properly for languages with a huge character set (like Chinese), gt7bits>=128 seems to be exactly the case when there are too many characters. You can see this from the comments in the code.

I opened an issue at matthiaskramm/swftools#68. However, I am not sure whether there is somebody interested in writing codes for this.

Do you have any methods to export all resources and coordinates for each letter/character?

szc126 commented 3 years ago

Is this the reason for the comment // cross our fingers and hope there aren't more than 256 glyphs in lib/devices/pdf.c?

I don't understand C, but it seems like a dummy font FreeSerif is being made for holding glyphs. What about making multiple dummy fonts, each one containing a maximum of 256 glyphs?

Iey4iej3 commented 3 years ago

Is this the reason for the comment // cross our fingers and hope there aren't more than 256 glyphs in lib/devices/pdf.c?

I don't understand C, but it seems like a dummy font FreeSerif is being made for holding glyphs. What about making multiple dummy fonts, each one containing a maximum of 256 glyphs?

IMHO, I don't find such kind of thing reasonable. PDFlib-Lite was dead in 2011, which was a subset of the proprietary PDFlib still selling today. I think that a rewrite based on a FOSS pdf library seems more attractive.

Iey4iej3 commented 3 years ago

thanks for your fancy project, i'm not sure whether you can read Chinese or not(i got problems while using this tool converting some swf in Chinese),actually, i came across 2 problems: version: gfx2gfx-pdf2text - part of swftools 0.9.2 (build 8d5a70b)

the 2nd problem is easy to fix(i don't know whether this fix is right or not), i just add 2 lines code after https://github.com/RunasSudo/gfx2gfx-pdftext/blob/8d5a70b1d8526b7d596b9675a23386636a5a3b35/lib/devices/pdf.c#L392

        if(gt7bits>=128)
            gt7bits=0;

but there still exists the 1st problem, some character is missing after convertion

I tried the code. And some characters are wrong, and reverted. And cannot copy text from pdf, how to set font for conversion? Please send to steve8000818@gmail.com if there is solution.

I believe that it is almost impossible to resolve. Essentially it is the restriction in pdflib-lite, and the goal of this hacking seems temporarily solving the problem if there are not so many non-ASCII characters, and therefore it does not work properly for languages with a huge character set (like Chinese), gt7bits>=128 seems to be exactly the case when there are too many characters. You can see this from the comments in the code. I opened an issue at matthiaskramm/swftools#68. However, I am not sure whether there is somebody interested in writing codes for this.

Do you have any methods to export all resources and coordinates for each letter/character?

I succeeded to do that, if I remember correctly. Maybe some utility in SWFTools works. See also https://reverseengineering.stackexchange.com/questions/133/how-does-one-reverse-engineer-a-swf-file