NASA-PDS / harvest

Standalone Harvest client application providing the functionality for capturing and indexing product metadata into the PDS Registry system (https://github.com/nasa-pds/registry).
https://nasa-pds.github.io/registry
Other
4 stars 3 forks source link

`Data too large` error from very large data products #133

Closed jordanpadams closed 12 months ago

jordanpadams commented 1 year ago

Checked for duplicates

Yes - I've already checked

🐛 Describe the bug

When I did a harvest of a data set with some very large data products, I get a data too large error and the data is not loaded into the Registry.

🕵️ Expected behavior

I expected the loaded would be nominally loaded into the Registry.

📜 To Reproduce

  1. Download TBD data product
  2. Attempt to harvest the product
  3. Note the error
[ERROR] LIDVID = urn:esa:psa:em16_tgo_acs:data_raw:acs_raw_hk_nir_20170907t000000-20170907t055959::3.0, 
Message = [parent] Data too large, data for [indices:data/write/bulk[s]] would be [16591820628/15.4gb], 
which exceeds the limit of [16287753830/15.1gb]. Current usage: [16591415264/15.4gb], new bytes reserved: [405364/395.8kb], 
usages [request=0/0b, fielddata=0/0b, in_flight_requests=405364/395.8kb, accounting=613644/599.2kb]

🖥 Environment Info

Linux

📚 Version of Software Used

3.7.6

🩺 Test Data / Additional context

TBD

🦄 Related requirements

No response

⚙️ Engineering Details

No response

alexdunnjpl commented 1 year ago

@jordanpadams what's the best way to get a copy of the label for this product?

jordanpadams commented 1 year ago

@alexdunnjpl a ping is out to the user.

alexdunnjpl commented 12 months ago

@jordanpadams looking deeper into this error, it appears to be due to imminent exhaustion of the JVM heap on OpenSearch, rather than any one request/product being too large. (Presumably RAM allocation is currently 16GB on that node)

The fix here is to bump up the instance size to cope with peak throughput, and/or incorporate pause/retry behaviour in harvest.

Closing as a duplicate of #125 on that basis, since the fix for that is a fix for this.

jordanpadams commented 12 months ago

@alexdunnjpl nice sleuthing. 🎉

alexdunnjpl commented 12 months ago

@sjoshi-jpl I see that psa is currently r5.4xlarge.search (128GB RAM) - did this get bumped up from r5.xlarge.search (16GB RAM) at some point recently?

sjoshi-jpl commented 12 months ago

@alexdunnjpl yes this was recently bumped up based on our last conversation with @jordanpadams and @tloubrieu-jpl as we discussed how PSA could be as large / resource intensive as GEO.