ericmckean / pdfium

Automatically exported from code.google.com/p/pdfium
0 stars 0 forks source link

Copy and paste failing #30

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
http://www.commonwealthfund.org/~/media/Files/Publications/Fund%20Report/2013/No
v/1717_Thomson_intl_profiles_hlt_care_sys_2013_v2.pdf

Copying only works line by line. If selecting a few lines or a paragraph, it 
fails.

Original issue reported on code.google.com by jam@chromium.org on 30 Jul 2014 at 4:09

GoogleCodeExporter commented 9 years ago

Original comment by bo...@foxitsoftware.com on 31 Jul 2014 at 6:26

GoogleCodeExporter commented 9 years ago
Also happens on 
http://www.nature.com/nature/journal/v512/n7514/pdf/nature13681.pdf

Coping fails on anything containing line-breaking character "-".
It is enough to select "-" or "h-" text from the end of the first line 
(starting from "On 1 April 2014, ...") for copy/paste to stop working.
Happens on all platforms (tried Windows, Mac and ChromeOS).

Original comment by g...@chromium.org on 3 Sep 2014 at 6:39

GoogleCodeExporter commented 9 years ago

Original comment by jam@chromium.org on 3 Sep 2014 at 8:38

GoogleCodeExporter commented 9 years ago

Original comment by jun_f...@foxitsoftware.com on 4 Sep 2014 at 1:14

GoogleCodeExporter commented 9 years ago
The root cause is that the hyphen "-" is represented as a 16-bits hex value 
"0xfffe" and returned to chrome with other characters. Seems that chrome 
doesn't handle this special char ("0xfffe"). There are several possible 
solutions to this case:
Solution 1: pdfium still returns "0xfffe" to indicate a hyphen somewhere. 
Chrome will check and handle this special char. A hyphen can be shown or a new 
line indicator or ignored by chrome. It depends on the behavior of chrome.
Solution 2: pdfium will omit the hyphen in the output string. In this way, 
chrome never knows where the hyphen exists and loses the opportunity to handle 
it. 
Others: any other solutions.

Original comment by jun_f...@foxitsoftware.com on 5 Sep 2014 at 5:22

GoogleCodeExporter commented 9 years ago
We prefer solution 1 because the API function "PDFText_GetText" which is called 
by chrome to get the selected text is a generally used API. If hyphens are 
omitted in this API, the callers may not have chances to handle hyphens in 
other scenarios. 

@Gene and John,
Do you have your preferred solutions?

Original comment by jun_f...@foxitsoftware.com on 5 Sep 2014 at 5:50

GoogleCodeExporter commented 9 years ago
Thanks for investigating. I agree option 1 is better.

Original comment by jam@chromium.org on 9 Sep 2014 at 9:14

GoogleCodeExporter commented 9 years ago
Thanks for investigating. I agree option 1 is better.

Original comment by jam@chromium.org on 9 Sep 2014 at 9:14

GoogleCodeExporter commented 9 years ago

Original comment by jun_f...@foxitsoftware.com on 9 Sep 2014 at 9:19

GoogleCodeExporter commented 9 years ago
http://crbug.com/327349, http://crbug.com/328303, and http://crbug.com/307523 
are likely all related.

Original comment by thestig@chromium.org on 9 Sep 2014 at 10:08

GoogleCodeExporter commented 9 years ago

Original comment by thestig@chromium.org on 9 Sep 2014 at 10:15

GoogleCodeExporter commented 9 years ago

Original comment by thestig@chromium.org on 11 Sep 2014 at 11:15

GoogleCodeExporter commented 9 years ago
Since the fix has to happen in chromium's src/pdf dir, marking this as a 
duplicate.

Original comment by thestig@chromium.org on 11 Sep 2014 at 11:16