Add PyMuPDF to components/convertors

Is your feature request related to a problem? Please describe. This is to rectify issues I am noticing using pypdf to convert pdf documents. Pypdf is producing junky text for technical programming documents that I've encountered several times so far. One example is from Fundamentals of Python Programming (https://folk.ntnu.no/sverrsti/INGG1001-H2019/pythonbook_20191015.pdf), where code characters are represented as follows:

Thetype function can determine the type of the most complicated expressions:
>>> type(4)
<class /quotesingle.Varint/quotesingle.Var>
>>> type( /quotesingle.Var4/quotesingle.Var)
<class /quotesingle.Varstr/quotesingle.Var>
>>> type(4 + 7)
<class /quotesingle.Varint/quotesingle.Var>
>>> type( /quotesingle.Var4/quotesingle.Var+/quotesingle.Var7/quotesingle.Var)
<class /quotesingle.Varstr/quotesingle.Var>
>>> type(int( /quotesingle.Var3/quotesingle.Var) + int(4))
<class /quotesingle.Varint/quotesingle.Var>

A more capable convertor that I've tested produces the following for this section:

The type function can determine the type of the most complicated expressions:

>>> type(4)
<class 'int'>
>>> type('4')
<class 'str'>
>>> type(4 + 7)
<class 'int'>
>>> type('4' + '7')
<class 'str'>
>>> type(int('3') + int(4))
<class 'int'>

I noticed this behavior propagating to responses during RAG where I scratched my head where it came up with this (using RAG to inform code generation), when those characters weren't in the source material, but instead generated by pypdf.

Describe the solution you'd like

Implement PyMuPDF as an alternative for a pdf convertor.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context A quick implementation of PyMuPDF in the pypdf.py file under components/converters. It's replacing pypdf, but you could probably just have both classes together.

pypdf_using_PyMuPDF.txt

deepset-ai / haystack

Add PyMuPDF to components/convertors #6861