Open BryceStevenWilley opened 1 year ago
Got a heap dump last night, here's all the useful info I can get from it:
Doing updates for: dallas:dcjv, tables: [documenttype, partytype, filingcomponent, filing, optionalservices]
. Optional services is probably key there.org.quartz.simpl.SimpleThreadPool$WorkerThread
, the Code Update thread
genericode._1.CodeListDocument
and postgresql.core.ParameterList
. 192,000 entries in the ParamaterList, so could batch Postgres queries if smaller than ~50k items. This issuse's main target is the CodeListDocument, details are above.org.apache.cxf.bus.extension.ExtensionManagerBus
, the WSDL junk
WSDLManagerImpl::schemaCacheMap
and definitionsMap
. Not sure how we can shrink that use, maybe if jurisdictions started sharing the NIEM XML files? Would help with repo size anyway.
Have been running into Heap overflows when updating codes. Still trying to narrow down which exact state it is, but it's failing when trying to execute the batch update in the postgres driver.
One possible way to reduce some memory pressure at this point is to not simply unmarshall the entire CodeListDocument at once, but to read each row individually. I think the best idea here is to do something like this: https://stackoverflow.com/a/16935069/11416267 in lines 164 and 175 of CodeDatabase. We just need to get the codes version and each individual row.
If there are still issues, we can look into doing separate Postgres updates not in batch, or simply making the batches smaller if there are over some amount of rows.
TODO:
Independently:
-XX:+HeapDumpOnOutOfMemoryError
works at all