TaTo30 / vue-pdf

PDF component for Vue 3
https://tato30.github.io/vue-pdf/
MIT License
472 stars 65 forks source link

Fix text layer selections overlaying each other & falsely ignoring some empty/blank space #158

Closed IllustrisJack closed 2 weeks ago

IllustrisJack commented 1 month ago

First of all, thank you for this project! Great stuff.

As title says the text layer selections could be improved which is currently for better or worse depending on the PDF. Can also be found in the example uploaded on your docs. I´d assume this comes from certain blank spaces being ignored/hard cut e.g. also for newline.

image

Additional context

TaTo30 commented 2 weeks ago

Hi, sorry for taking too long to reply.

As title says the text layer selections could be improved which is currently for better or worse depending on the PDF. Can also be found in the example uploaded on your docs. I´d assume this comes from certain blank spaces being ignored/hard cut e.g. also for newline.

Are you using a chrome-based browser? this issue seems to be more related on how the browser "paint" the selection layer, in firefox the selection looks better:

imagen

Also, this library does not make any treatment to the text-layer it just use the pdf.js api without changing the text items, maybe for chrome I could try to do some clean process but I am aware that it could break other things just for something aesthetic.

IllustrisJack commented 2 weeks ago

Hi, sorry for taking too long to reply.

As title says the text layer selections could be improved which is currently for better or worse depending on the PDF. Can also be found in the example uploaded on your docs. I´d assume this comes from certain blank spaces being ignored/hard cut e.g. also for newline.

Are you using a chrome-based browser? this issue seems to be more related on how the browser "paint" the selection layer, in firefox the selection looks better:

imagen

Also, this library does not make any treatment to the text-layer it just use the pdf.js api without changing the text items, maybe for chrome I could try to do some clean process but I am aware that it could break other things just for something aesthetic.

Thanks for the reply and no worries for taking long, I am sure all of this is in your free time! Yes I was using multiple chromium based browsers when testing this. I guess it is mostly for aesthetic reasons, so if it is too much work we can close this issue.

TaTo30 commented 2 weeks ago

I guess it is mostly for aesthetic reasons, so if it is too much work we can close this issue.

Maybe I am not getting you, you said there are some blank spaces that are being ignored on selection but I do not see exactly where. Using the same example, the copy result on this text:

1. Introduction
Dynamic languages such as JavaScript, Python, and Ruby, are pop-
ular since they are expressive, accessible to non-experts, and make
deployment as easy as distributing a source file. They are used for
small scripts as well as for complex applications. JavaScript, for
example, is the de facto standard for client-side web programming
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
PLDI’09, June 15–20, 2009, Dublin, Ireland.
Copyright c© 2009 ACM 978-1-60558-392-1/09/06. . . $5.00

But this result is the same in all cases whether testing on the library docs or pdf.js demo page (even between chromium and firefox browser). Could I missing something?

IllustrisJack commented 2 weeks ago

I guess it is mostly for aesthetic reasons, so if it is too much work we can close this issue.

Maybe I am not getting you, you said there are some blank spaces that are being ignored on selection but I do not see exactly where. Using the same example, the copy result on this text:


1. Introduction

Dynamic languages such as JavaScript, Python, and Ruby, are pop-

ular since they are expressive, accessible to non-experts, and make

deployment as easy as distributing a source file. They are used for

small scripts as well as for complex applications. JavaScript, for

example, is the de facto standard for client-side web programming

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for profit or commercial advantage and that copies bear this notice and the full citation

on the first page. To copy otherwise, to republish, to post on servers or to redistribute

to lists, requires prior specific permission and/or a fee.

PLDI’09, June 15–20, 2009, Dublin, Ireland.

Copyright c© 2009 ACM 978-1-60558-392-1/09/06. . . $5.00

But this result is the same in all cases whether testing on the library docs or pdf.js demo page (even between chromium and firefox browser). Could I missing something?

I think you can safely ignore my statement. Some pdf viewers also add newline as copyable spaces to retain the format. This is not the case with pdf.js so I probably got things confused!