Closed nihirv closed 18 hours ago
I was trying to use the following code as a workaround:
loader = PyPDFLoader(doc)
docs = loader.load()
[rag.insert(x.page_content) for x in docs]
Surprisingly this also seems to hang after we reach a certain size of graph. I'm wondering if there's a quadratic operation within the code somewhere based on the number of nodes in the graph already that is causing it to freeze?
I replaced the NanoVectorDb with a HNSWVectorDb but this also does not seem to be the root of the issue
Seems like this might be an issue with executing this code in a notebook environment on a VPN.
If you're running into the issue, would recommend disabling the VPN and running the code either in a regular python script or in a debugger
If this is being run on a lengthy doc (few hundred pages) and with a small-ish chunk size (e.g. 600), we reach a point where the entity and relationship labelling just hangs.
Small snippet:
(And the processing does not continue despite there being another few hundred chunks to continue parsing).
Any reason why this could be the case? I've tested against a couple of different docs.
As far as models go, this occurs with 4o-mini and 4o. However, 4o-mini seems to be able to process more chunks before the hanging starts