Closed sfisher closed 4 months ago
I updated the whole document structure as you can see in the tests. Now it has more nesting levels -- resource
instead of things like resource_creators
, resource_title
so it's not repetitive.
Also including a tiny bit of metadata from some of the other relations like owner
, ownergroup
, profile
instead of just the foreign key id. These are also nested. I don't think most of these are used in search now, but I can imagine it might be useful to search for an email or name or something in opensearch.
I think everything in here is in #649 so I can close this one.
IDK how we want to handle this since it isn't a fully finished feature, but merging into develop (or even main) shouldn't affect other working code right now.
This essentially replicates search information that is currently in the database except for a couple of things I didn't think were used for search. There may be some changes and optimizations we want to make which I think will become more obvious when working through the UI and API areas that use search and will become more obvious from there and we may do some more revisions to the doc format.
Things in this PR:
Basic generation of search document information
Manual test example
Automated basic unit tests
Script to update search index based on database information
The script will go through and add/update all items from the database into OpenSearch. It uses OpenSearch bulk update functionality and I had to add some workarounds for loading all records from
You can give it a primary id as an argument and it will start with the documents after that.
In the future we may want to make another argument that reindexes everything after a certain date instead to update new items only.
Adds things to SSM (already done on dev/stg and has placeholder values on production which we'll need to update once we get an OpenSearch server for that environment).
Fixed problems with the update script that would run out of memory at around 150,000 records. It turns out that the Python
functools lru_cache
does not work like a typical memoization library and doesn't free memory once objects are destroyed in some circumstances, so had to revert to the default params rather than the copilot suggested settings which filled up memory. :-(Script seems running fast on dev now and doesn't have memory problems.
These changes are related to tickets #590, #591, #592 .