SciGraph / golr-loader

Convert SciGraph queries into json that can be loaded by Golr
Apache License 2.0
1 stars 3 forks source link

Too many values for UnInvertedField faceting on field object_closure #44

Closed kshefchek closed 5 years ago

kshefchek commented 5 years ago

Solr-dev is currently returning an exception for any biolink call that facets over object_closures: java.lang.IllegalStateException: Too many values for UnInvertedField faceting on field object_closure

We've appeared to have hit some limit, but it's not obvious from looking at the data: Solr Production Total Docs: 37502996 Unique values in object_closure: 4005165

Solr Dev: Total Docs: 38759008 Unique values in object_closure: 4350271

It's also possible the limit is based on the number of values per a single document, but this is harder to gather without iterating over each document.

Related: https://issues.apache.org/jira/browse/SOLR-11240

Possible solutions:

cc @kltm @doctorbud @deepakunni3

cmungall commented 5 years ago

4m is more than I would have expected, iI guess this is because genes, variants, etc make their way into the closure?

kshefchek commented 5 years ago

it does seem high, I could put the values in a file so we can debug.

kshefchek commented 5 years ago

Another solution is to only close over equivalentClass for entities such as genes, variants, etc.

kltm commented 5 years ago

Kent,

It may also be that the resources applied to the Solr instance are insufficient to build the necessary caches. Would you have taken a look already at the admin/stats page (e.g. fieldValueCache)?

https://wiki.apache.org/solr/SimpleFacetParameters#facet.method https://wiki.apache.org/solr/SolrCaching#fieldValueCache

Did this error crop up after a data load or schema change, or has it just not been noticed until now?

Cheers,

-Seth

On 3/13/19 12:14 PM, Kent Shefchek wrote:

Another solution is to only close over equivalentClass for entities such as genes, variants, etc.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SciGraph/golr-loader/issues/44#issuecomment-472565943, or mute the thread https://github.com/notifications/unsubscribe-auth/ACkDnA2mkBA_bstaW_o9jjZbnmn96O6xks5vWU4AgaJpZM4btneC.

kshefchek commented 5 years ago

This occurred after a new data load, and we've tried increasing the memory.

some more stats on the object_closure field: sum: 819945513 max: 648 mean: 21.15496642741734

All values: object_closure.txt.gz

solr-admin

Nothing I can see in the logs when the server is starting up.

kshefchek commented 5 years ago

Here is some more info after reindexing:

  1. Indexed on monarch5, moved files to solr-dev
    • facets on object_closure work
  2. Optimized index
    • Exception when faceting on object_closure

So oddly this is related to optimizing on the solr-dev VM, versus optimizing on monarch5, I changed this as we were running out of disk space on monarch5.

kshefchek commented 5 years ago

I'm uncertain of the cause of this, but at least understand how to avoid it. Propose we close since it's no longer an issue.

kshefchek commented 3 years ago

this is rearing its head again, although not for every load, and it seems re-indexing fixes it. Will play around with memory config.

cmungall commented 3 years ago

does this help? https://lucene.472066.n3.nabble.com/java-lang-IllegalStateException-Too-many-values-for-UnInvertedField-faceting-on-field-content-td4218197.html

On Wed, Dec 30, 2020 at 10:27 AM Kent Shefchek notifications@github.com wrote:

this is rearing its head again, although not for every load, and it seems re-indexing fixes it. Will play around with memory config.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SciGraph/golr-loader/issues/44#issuecomment-752715883, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMONDL6L6AEUHRTUB4QLSXNWIHANCNFSM4G5WO6BA .

kshefchek commented 3 years ago

I have to look back through my notes to see if I tried docValues. This has also been fixed in Solr 7 and above so another reason to push for an update: https://issues.apache.org/jira/browse/SOLR-11240