allenai / pdf-component-library

44 stars 4 forks source link

Fix mis-alignment of PDF contents #60

Closed yensung closed 2 years ago

yensung commented 2 years ago

Description

https://github.com/allenai/scholar/issues/29678

The situation was made intentionally by the author of react-pdf library: https://github.com/wojtekmaj/react-pdf/issues/332#issuecomment-469037713

Solution

I tried adding a custom function by setting <Page ... onLoadSuccess={adjustTextLayer}> ... </Page> to reset the styles of spans that wrap the texts but didn't work. So I ended up forcing height to be 0 by using !important flag.

Before

Screen Shot 2021-12-07 at 13 36 28

After

Screen Shot 2021-12-07 at 13 37 01

More thoughts

The texts are still not fully aligned which is probably because the font families and sizes between the red and the black are different. Would we like them to be the same?

carolinepaulic commented 2 years ago

The texts are still not fully aligned which is probably because the font families and sizes between the red and the black are different. Would we like them to be the same?

Yeah, the styles should be as close as possible if there is a consistent way to do so. Otherwise we should select a default font that we think will match the most papers. Or maybe just hide this text for now. Maybe a question for Matt.

yensung commented 2 years ago

Have you tried applying a fix to the PageWrapper component? in the css files included in the library.

Something similar to image

I did, but the solution didn't work well. This is the result after resetting the height of the container:

Screen Shot 2021-12-08 at 09 20 49

This is the result by setting height: 0em!important to the span elements under each container:

Screen Shot 2021-12-07 at 13 37 01

The first way has a better match to the texts in each paragraph. However, when a paper has many different font sizes (ex. title), the second way works better. I think I still need some time to figure out a solution that can take both advantages. Also I wonder if resetting CSS properties with JS is faster than resetting the property in CSS directly.

smitar commented 2 years ago

Could you try your way in the library css files instead of the demo. We might not be able to get a perfect match for different font types. Either solutions look way better than what we have today.