Open jasonaowen opened 6 years ago
I think the biggest constraint we have as we investigate solutions is that we don't want to complicate deployment, which means we don't want any external servers or services.
Given that, the two solutions @HemKal and I have found so far are:
Hibernate Search is provided by WildFly (along with Hibernate itself), and if we ever change application servers we would be able to bundle it in our build instead. Lucene is more powerful than PostgreSQL's full-text search, although I'm not yet sure what exactly that means. It integrates with Hibernate to update its index each time we commit modifications to an @Entity
(including creation). HS requires storing an index file on disk; it's not clear to me yet where that should live, how big it would get, or what potential deployment challenges that might bring (such as permissions around the application server OS user reading and writing files on disk). It's also not clear to me yet how it would handle searching across multiple entities; the examples I've seen so far are all for searching throughout a single entity type. Finally, I don't yet understand how it scales across multi-node application servers (nor, to be fair, how important that might be for the PSM).
PostgreSQL has full-text search built in. We could either add a tsvector
column to each table we want to search, or we could make a denormalized table that relates back to other tables. Per-table columns could be kept up to date with triggers; the preferred solution for denormalization seems to be a materialized view, which needs to be refreshed periodically.
Both of these solutions have some challenges we'll need to figure out:
@Entity
s do we want to be searchable? Does it make sense to search each type separately and combine results, or to have some kind of composite/denormalized index?We will continue to research this.
I think I got some of your answers (originally posted by @kfogel )
(@kfogel adds: Thanks to Katherine Stewart of the Louisiana Department of Health and Darryl Hellams of Virginia Medicaid for taking the time to answer.)
Do you save common searches for push button access? What are those searches?
LA: No, but we don't currently have that ability in our system. A common search I'd imagine using that would be beneficial to have as a "pushbutton" is a list of all enrollment apps in the queue, waiting to be processed.
Are pattern-match searches often used? Is being able to put date-range restrictions important?
LA: The ability to search by partial provider names would be beneficial. Yes, date range restrictions are important for us.
Does anyone do partial (pattern-matched) NPI searches in particular?
VA: I do not believe we search by partial NPI
LA: No, we only find a need to do searches based on partial provider name.
Does your current system retain search logs, and if so are they available to admin users? Do you ever consult them?
LA: No, not in our current system.
I think it makes sense for us to say, at least for a first pass, that full text search is for providers and service admins to search enrollments.
High-level things that I suggest we not include in search (at least for now):
I expect that last one, the contents of uploaded licenses, to be something that we do want to support someday; for the moment, however, it would mean parsing and indexing arbitrary files. If the provider uploaded a picture or scan of their license, do we need to do text recognition on that image? If they uploaded a PDF, can we usefully extract text from it? If the PDF is effectively an image, with no textual data, do we then need to do text recognition? This is a large enough problem that I suggest we defer it until we get the structured data we already have in a searchable format.
What specific
@Entity
s do we want to be searchable?
I reviewed the current list of Hibernate entities, and I think these are the ones we care about:
Address
BeneficialOwner
ContactInformation
DesignatedContact
Document
(filename only)Enrollment
Entity
LeieAutomaticScreeningMatch
License
(user-provided license number)Organization
OrganizationBeneficialOwner
Person
PersonBeneficialOwner
ProviderProfile
ProviderStatement
This may not be a complete list.
I suspect that we may need to do some of the data model improvements mentioned in #57 to effectively implement full text search; many of the relationships are application-side in a way that make linking individual entities to the underlying concept difficult at best.
We need to add the ability to do full-text search in the PSM.
This comes from a few search-related requirements:
The first is relevant to this insofar as we need to be able to search for names; the other two seem to be asking the same thing in different words.
Investigate the available full-text search solutions, consider their requirements, choose one, and integrate it.