Future-House / paper-qa

High accuracy RAG for answering questions from scientific documents with citations
Apache License 2.0
6.29k stars 591 forks source link

Reading markdown files? #580

Open aginiewicz opened 2 weeks ago

aginiewicz commented 2 weeks ago

Hello, I wondered what is recommended way to use local markdown files with paperqa. Looking at readers.py it seems markdown is threated as code, so one would have to generate html or txt from md file to treat it as text if I understand correctly. Is it currently possible without changes to code, for example from CLI?

My intended use case is chatting with Obsidian Vault storing lecture notes to all courses I was teaching in the past, containg citations linking with my Zotero database trough https://github.com/hans/obsidian-citation-plugin (pandoc-style citations) - I would like paperqa to be able to read those notes and correctly use citations mentioned in them.

I'm aware about Obsidian plugins to chat with the valut, but those seem to be behind in performance compared to paperqa. Also, they allow to chat only with markdown files, while here I hope to find a way to chat with both Obsidian Vault and PDF files from Zotero database at the same time.

victorconka commented 2 days ago

Interrested in this. I too have md files, which are the result of the method I've found suitable for test extraction from my pdf files.