bcgov / entity

ServiceBC Registry Team working on Legal Entities
Apache License 2.0
23 stars 57 forks source link

Namex API - implement NR search API #16714

Closed severinbeauvais closed 8 months ago

severinbeauvais commented 11 months ago

Similar to Business Search API but for NRs -- support search by NR number or business name.

Need to return:

DO NOT return NR "auth keys" (email / phone) in search response. (This would allow anyone to do a search and then be able to access any NR.)

Based on discussions below and in RocketChat, it appears that the best way to implement this is to use SOLR.

Sample search results for a NR number (or business name):

image.png

severinbeauvais commented 11 months ago

@seeker25 what can you add to this conversation?

severinbeauvais commented 11 months ago

@davemck513 @steveburtch Is this a ticket for Entities Team or another team?

This is needed for the NR search on the new My Business Registry page.

severinbeauvais commented 11 months ago

@Mihai-QuickSilverDev Please identify the "60 day" rules (or whatever they are) for which NRs should be returned in the search results.

severinbeauvais commented 11 months ago

Should NRs affiliated to other accounts be returned in the search results? See also #16718

seeker25 commented 11 months ago

https://github.com/bcgov/namex/blob/c92d6484a63b5407aea5ea31bbeb8bb060d99f15/api/namex/resources/requests.py#L398 <-- I'd modify that route so it takes more than just identifiers? Requires System and returns phone number and email - could change this

severinbeauvais commented 11 months ago

returns phone number and email

We need to be careful what calls return this data, since it is the key to accessing/affiliating/using a NR.

For sure the UI should not compare the email/phone provided by the user with data from the API; this check should be done in the API somehow.

seeker25 commented 11 months ago

returns phone number and email

We need to be careful what calls return this data, since it is the key to accessing/affiliating/using a NR.

For sure the UI should not compare the email/phone provided by the user with data from the API; this check should be done in the API somehow.

Yup, which is why it's SYSTEM only right now and is called via AUTH-API (which checks the affiliations)

We should probably be caching the SYSTEM bearer token fetching as well

severinbeauvais commented 11 months ago

So far, we have the following unanswered questions:

  1. Make a fast database search?
  2. Or make a new SOLR query for NR num / name?
  3. Should NRs affiliated to other accounts be returned in the search results?
  4. How much effort is this?
seeker25 commented 11 months ago

@kialj876 could probably answer 1, 2, 4

seeker25 commented 11 months ago

After talking with Kial, probably a good idea to use business search for the business search UX (it uses SOLR already I believe).

EG. GET https://bcregistry-test.apigee.net/registry-search/api/v1/businesses/search/facets?query=value:Hello&start=0&rows=100

query=value:<user input> param and it will match on name/identifier/tax id.

For the names side, probably best to just be consistent? There is already a component of SOLR inside of namex.

From Kial:

"The component they built in MHR thats connected to business search looks the exact same as what was in this design except for the 'added' enhancements so I would just take that and tweak it"

severinbeauvais commented 11 months ago

Kial said:

jdyck-fw commented 11 months ago

Consider this ticket a time-box of an estimate. If it requires a greater deal of work than what feels natural for a 5 point ticket, we agreed that it could result in another ticket to complete the work.

kialj876 commented 11 months ago

For whoever takes this ticket you can reach out to me and I can spend a half hour showing you what's already there for the namex / solr connection and what you need to add in a bit more detail if you want

kialj876 commented 11 months ago

I'm not sure if we need to add any auth check or block 3rd party api users from using this endpoint as it is fully public I think? That's more of a business question

seeker25 commented 11 months ago

Kial should be on every team Hahaha

severinbeauvais commented 11 months ago

@davemck513 @Mihai-QuickSilverDev Should the "NR Search" API endpoint be fully public (no authentication/authorization), just like the Business Search endpoint (which nevertheless requires an API key) ?

seeker25 commented 11 months ago

@pwei1018 / @Oge01 (hopefully this is the right one) -^

Mihai-QuickSilverDev commented 11 months ago

@severinbeauvais What is your opinion, from a technical pov?

Mihai-QuickSilverDev commented 11 months ago

In the "My Registry Dashboard" this search would be accessible by account login. If we are going to use the same search mechanism in the open Names page, then it should be wide open. I guess a design question for @janisrogers

severinbeauvais commented 11 months ago

What is your opinion, from a technical pov?

In the "My Registry Dashboard" this search would be accessible by account login. If we are going to use the same search mechanism in the open Names page, then it should be wide open.

I'm wary of "open" endpoints, because anyone out there could access the data freely. Is that OK for business data? For NR data?

Mihai, you're right, the business search will be on the MBR page, which you only get to after login. But the Business Search API endpoint is apparently open already (need an API key but it's passed as clear text in network message), and we're talking about making the NR Search API similarly open.

David, your feelings on this?

kialj876 commented 11 months ago

I mean anyone can make an account and access these services. If anyone wanted to they could create an account / build a screen scraper to take all of the data at any given point (without needing any API key info etc.). Verifying the token doesn't provide any real security unless we are actually looking to prevent a group of users in our applications from accessing the data

severinbeauvais commented 11 months ago

^^ Yup, good points.

If we have data that we don't want an external party to have open access to then let's have that conversation, but it doesn't change THIS ticket for now.

severinbeauvais commented 10 months ago

@Mihai-QuickSilverDev @davemck513

I had a chat with Vysakh and we have some follow-on questions and possible issues...

  1. What NR statuses do we want to search for? New? Pending? Note that SOLR only has approved and rejected ones.

  2. Do we really need a partial NR number search? That is not available in SOLR at the moment. (Full NR Number match does exist.)

Should we look into updating SOLR? Should we do a two-level serach -- use SOLR for the first part of the search, and then "join in" the rest of the NR data from namex db? Should we implement this in namex db only (no SOLR)? Performance will likely be an issue, though we could return fewer results (eg, 10 instead of 20) to make things faster.

cc: @kialj876 @thorwolpert

Mihai-QuickSilverDev commented 10 months ago

Priority of NR searches:

  1. Draft
  2. Active (approved) - not consumed
  3. Expired within the last 60 days - not consumed
  4. Consumed
jdyck-fw commented 10 months ago

~@ozamani9gh @ozamani9 - Have a release that we can tag this ticket to?~

SB says: ^^ this is obsolete; we still need to implement this

jdyck-fw commented 9 months ago

~@PCC199 - Ready for SRE release~

SB says: ^^ this is obsolete; we still need to implement this

jdyck-fw commented 9 months ago

@davemck513 Will follow up on this.

jdyck-fw commented 9 months ago

@PCC199 @pwei1018 - This is ready for release, though there will be some extra SOLR work required for full functionality. (ticket to be linked here by @Mihai-QuickSilverDev )

There is currently no resource available to do this SOLR work.

Possibly as a next phase. Likely best to group them together in a release.

seeker25 commented 9 months ago

We had a discussion about SOLR with @kialj876 & @vysakh-menon-aot - Kial didn't think it was worth doing in SOLR because it was going to be tossed after Namex gets rebuilt.

@kialj876 maybe you could comment - even if we had a resource available to do the SOLR work - probably not worth it?

seeker25 commented 9 months ago

I also think this is possible to accomplish without using SOLR, at least in the short term.

I remember working on BC Assessment's front page search - we didn't use anything external like SOLR to do a like search over 2,160,828+ property addresses in BC. You can try it yourself: https://www.bcassessment.ca/

I looked at my old notes, we used this feature for Oracle to build a "catsearch" - https://docs.oracle.com/cd/A91034_01/DOC/text.901/a90121/csql3.htm with specialized indexes.

"INDEXTYPE IS CTXSYS.CTXCAT is Oracle's way of defining a "full text index" on a column -- as opposed to an ordinary B-tree."

I think Postgres probably has some sort of similar functionality. Postgres uses GIN indexes - GIN indexes are usually used to for PostgreSQL Full Text search (FTS). More info in here probably: https://www.postgresql.org/docs/current/textsearch.html

Rough working implementation here: https://dev.to/____marcell/fast-fulltext-search-with-postgres-gin-index-22n5

thorwolpert commented 9 months ago

Glad you liked the Oracle Search, some very excellent c-code there. Having been on that team I know it well. We use some advanced searching in PPR on a much larger set.

We use SOLR as that allows us to cross systems without having the source data locally and because John's team was/is using it.

Aside from that, we probably don't want to spend much time on any Names related work as it is going to be completely over-hauled.

severinbeauvais commented 9 months ago

So, we need the ability to do a NR lookup that returns all the states we care about, which the existing SOLR search doesn't do. What option do you recommend?

  1. extend existing SOLR
  2. use local SQL searching
  3. implement Oracle Search
seeker25 commented 9 months ago

I'd say go with 2 - if possible

It might be possible to create a few GIN indexes - and experiment with similar data size as prod - see if we can get the performance of the existing database search working sub second hopefully.

3 - We don't use Oracle, we're using postgres for the namex database 1 - Don't want to spend time on names work

When Names eventually gets overhauled - we can swap the database search with SOLR search.

thorwolpert commented 9 months ago

You're not doing any interesting text searching, and there's no way we'd stand up Oracle with all the work being done to move off of it. As a tactical take, what's wrong with just a SQL query in postgres? There's no type-ahead or anything??

When we migrate names to the new SOLR setup, we can change it. ^^ agree with Travis on point 1 there

jdyck-fw commented 9 months ago

"We aren't going to include expired or consumed at this point, it will be a post MVP thing." - Dave

Currently we have rejected and approved states from SOLR. From the SQL query, we have draft. NRs in other states cannot be searched and therefore not affiliated.

This ticket is indeed now ready for release for MVP.

seeker25 commented 9 months ago

Sounds good, tried it on an branch, it's fast already (probably don't need the GIN indexes unless you're including consumed):

https://bcregistry-account-dev--pr-2469-7dx7pg9r.web.app/account/3040/business

DEV (all states):

namex=# select count(*) from requests;
 count
--------
 439167
(1 row)

namex=# select count(*) from names;
 count
--------
 560177
(1 row)

DEV:

namex=# select count(*) from requests where state_cd in ('INPROGRESS','DRAFT', 'REFUND_REQUESTED');
 count
-------
 15992
(1 row)

PROD:

namex=# select count(*) from requests where state_cd in ('INPROGRESS','DRAFT', 'REFUND_REQUESTED');
 count 
-------
 11668
(1 row)

expired is fairly small too I believe

image