Utkarsh212 / react-pdftotext

A simple light weight react package to extract plain text from a pdf file.
https://www.npmjs.com/package/react-pdftotext
MIT License
9 stars 4 forks source link

Extracts text with spaces after each character #4

Open 1vank1n opened 3 months ago

1vank1n commented 3 months ago

Hello @Utkarsh212 ! Thanks for the react-pdftotext. It works great 99% of the time. But I got a strange extraction with some pdfs. I attached one as an example.

example.pdf

The result of the extraction looks like this:

T h e   a c c e p t s   2 0   r e q u e s t s   p e r   s e c o n d ,   b u t   f u r t h e r   s e n d i n g   i s   d i s t r i b u t e d   o v e r   t i m e   t o   s m o o t h   o u t   t h e   l o a d   a n d  n o t   e x c e e d   t h e   A P I   l i m i t s   o f   t h e   m e s s e n g e r .  @ P l a t f o r m  I s   t h e r e   a   l i m i t   o n   t h e   n u m b e r   o f   o p e n   c h a t s   p e r   ? @ C o m p a n y  T h e r e   a r e   n o   l i m i t s   o n   o p e n   c h a t s .

Do you have any idea how to fix this?

Utkarsh212 commented 3 months ago

Hi @1vank1n , Thank you for your feedback and for using react-pdftotext! I'm glad to hear it works well for you most of the time.

I've observed the issue you mentioned with the above example.pdf file and have some findings. To address this problem effectively, I need a bit more information from you: