Open mattcg opened 3 years ago
Can you debug it and submit a patch, please?
Yes, will do so!
I was able to replicate this after re-indexing a very large dataset. It's certainly a bug; I just haven't discovered the cause yet.
If you have any lead on what the parent document of the stray fragment might be (and ideally share it), that would help us debug it.
This might have been related to https://github.com/alephdata/aleph/issues/3923
What a debug find, @sunu. This one had been killing me for ages. That's a super logical explanation....
Wow sunu, incredible find! Yes, this is definitely the reason. It also explains a problem we were constantly facing, of Tables showing up in search results without a parent document that could be downloaded.
This is a bit difficult to reproduce and I have tried debugging and gotten nowhere. Periodically, some documents that are deep within a directory hierarchy will appear, as copies of the original documents but orphaned from the parent directory, at the root of the dataset directly. After deleting these orphan documents, some event - a re-index, re-ingest or upgrade - seems to trigger their re-appearance.
In other instances, these documents are not actual documents but empty 'Table' documents. Again, when deleted they re-appear. If were to guess I'd imagine it's some race condition - attempting to index the child before the parent document is indexed, but this is just an uneducated guess.