NASA-PDS / harvest

Standalone Harvest client application providing the functionality for capturing and indexing product metadata into the PDS Registry system (https://github.com/nasa-pds/registry).
https://nasa-pds.github.io/registry
Other
4 stars 3 forks source link

Update to utilize new multi-tenancy approach #118

Open jordanpadams opened 1 year ago

jordanpadams commented 1 year ago

💡 Description

sjoshi-jpl commented 6 months ago

Modify Harvest to:

  1. Get Cognito JWT tokens (for write access to OpenSearch)
  2. Get signed URLs (call API Gateway path for GET /signed-url)
  3. For write access to OpenSearch with Signed-URLs (directly calling OpenSearch with signed-url) for read-only access to OpenSearch (call API Gateway path for GET /registry)
al-niessner commented 6 months ago

@jordanpadams @sjoshi-jpl

Is there a cognito tutorial for us somewhere? Just the readme of some repo maybe?

al-niessner commented 5 months ago

@alexdunnjpl @jordanpadams @tloubrieu-jpl

Looking for opinions. Currently, harvest splits up server information. In the harvest config, provide registry URL, index, and auth file. In auth file provide trust self signed and username/password. Trusting self signed is a server problem like the URL not auth information. Also, harvest uses a sax parser to read the config file making it difficult to understand what is legal and illegal in the file.

There are several theories of approach here:

  1. do least change
  2. minimize change while maximize consistency
  3. do it the right way

least change

So, least change would mean shoving clientID and urlIsCognito into auth file. Already had some discontent at last breakout over that. Not a huge fan of it myself because now you have repeated information for Nick and Nora who just care about their username/password and all the rest is meaningless. The auth file would look like:

trust.self-signed = true
clientID = a123bajkhfdkjalkjalj212lsdlkjsgjkl
urlIsCognito = false
# TODO Warning: Use the default username and password only for testing purposes in local setup
user = nickandnora
password = asta

minimal change but consistent

Here, group the server information in harvest config and make auth just username/password:

user = nickandnora
password = asta

However this pushes three new items into the harvest config. Since it is a sax parser it will unduly complicate the code and provide no explanation of what is possible. In turn, all the examples would have to be expanded and expounded to cover the new options. It would looks something like:

<registry url="https://es:9200" 
               index="registry"
               clientID=akjdsajfdkjaklfoiwi"
               trust.self_signed="false"
               urlIsCognito="true"
               auth="auth.cfg" />

right way

First, write a schema for harvest config. It would have docs to explain the options. It would allow for xmllint to determine if the config is valid and tell the user where the error is. Since the user base is PDS, they should be familiar enough with XML and schema to use it effectively.

Second, use JAXB and get out of the XML parsing business. Is much more secure with respect to injection and it also validates the XML as reading.

Lastly, auth file would be username/password only. The line would look more like:

<registry url="https://es:9200" 
               index="registry"
               clientID=akjdsajfdkjaklfoiwi"
               trust.self_signed="false"
               style="direct" --> schema would limit to direct or cognito for now
               auth="auth.cfg" />

Obviously I prefer the last. When I had to write my own harvest from scratch I had to read java code and was really annoyed that there was no schema. It will take a little bit of time (days) to do schema and stuff but will pay dividends over time.

Thoughts?

alexdunnjpl commented 5 months ago

Firm default-vote for "right way".

jordanpadams commented 5 months ago

@al-niessner +1 to "right way". We have gotten complaints about not having a schema file in the past for this. They have some schema generators online you can use to get a first pass at what that could be as well.

tloubrieu-jpl commented 5 months ago

@al-niessner is looking at the xsd schema that requires update for the cognito configuration in the harvest configuration file.

tloubrieu-jpl commented 5 months ago

The schema is now updated and is backward compatible. @tloubrieu-jpl can review the new schema available in the draft PR.

al-niessner commented 5 months ago

146 was only start to solving this ticket.

jordanpadams commented 3 months ago

@al-niessner status: Got a response from Cognito team. Waiting for @sjoshi-jpl to make the happenings happen.

tloubrieu-jpl commented 2 weeks ago

Task for @tloubrieu-jpl : write an updated documentation for Gary.

Task plan for @gxtchen :

tloubrieu-jpl commented 2 weeks ago

@gxtchen as a note, after we met I also thought, we also need to think of how to automate the tests that you are going to write for harvest and registry-manager. That can be done in the docker compose set-up.