This search engine will index a folder of PDF files, break them down into chunks, and then let you search using text or image for relevant chunks. In the frontend you'll see the returned chunks with a link to their associated PDF file.
Bear in mind this is a work in progress
pip install -r requirements.txt
cd backend
python app.py -t index
to index your documentspython app.py -t search
to open a RESTful search interfacecd frontend
streamlit run frontend.py
config.yml
frontend/.streamlit/config.toml
docker-compose up
If you're planning to use "Print to PDF" from your web browser for testing, I recommend using Chrome over Firefox. Firefox converts characters strangely (for example fi
becomes fi
) which could affect search results depending on what the encoder recognizes as a meaningful unit.
Feel free to make a PR to help out with these!
datauri
when indexing imagesdocker-compose.yml