NOAA-GSL / vxDataProcessor

A parallel data processing project for MATS
Other
1 stars 0 forks source link

Couchbase access for public MATS apps #18

Open randytpierce opened 1 year ago

randytpierce commented 1 year ago

The scorecard app needs to write to the Couchbase SCORECARD collection. Currently, our Couchbase cluster is deployed on our internal network. This leaves us wondering how we will make the scorecard app publicly available on https://gsl.noaa.gov/mats/.

It seems like we can either:

  1. Deploy the Couchbase cluster into the DMZ and let the scorecard app have limited write permissions. (preferred)
  2. Deploy some other Couchbase cluster in AWS(?) and let the scorecard app have limited write permissions. (It's not clear how this is different than 1. other than AWS being involved)
  3. Have some kind of firewall hole from the app in the DMZ to our internal Couchbase cluster. (This seems problematic from a security perspective)

We need more discussion with ITS on this.

### Tasks
- [ ] Review this plan with Shannon
- [ ] https://github.com/NOAA-GSL/vxDataProcessor/issues/107
- [ ] https://github.com/NOAA-GSL/vxDataProcessor/issues/105
- [ ] https://github.com/NOAA-GSL/vxDataProcessor/issues/106
- [ ] Make sure the esrl router configuration is correct for the public nginx gateway (for both VMS)
- [ ] Make sure the landing page is correct
- [ ] Monitor performance of the beta server MATS (with scorecard) and be prepared to use a different VM if necessary
- [ ] Create and verify acceptance test for scorecard (probably on beta server)
randytpierce commented 1 year ago

I spoke with Shannon today about this problem. Shannon said that if we made a readonly user that did the reading and a special write only user that could ONLY write the SCORECARD collection, nothing else, then we could probably put the CB cluster and the vxDataProcessor into the DMZ. I can see how that would work.

mollybsmith-noaa commented 1 year ago

We would need to write to both the SCORECARD collection and the SCORECARD_SETTINGS collection, but then sure!

On Thu, Mar 9, 2023 at 3:47 PM randytpierce @.***> wrote:

I spoke with Shannon today about this problem. Shannon said that if we made a readonly user that did the reading and a special write only user that could ONLY write the SCORECARD collection, nothing else, then we could probably put the CB cluster and the vxDataProcessor into the DMZ. I can see how that would work.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-GSL/vxDataProcessor/issues/18#issuecomment-1462931086, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHBWGZP23V5IOA7GPL5XQRDW3JMXBANCNFSM6AAAAAAVUJ336I . You are receiving this because you are subscribed to this thread.Message ID: @.***>

randytpierce commented 1 year ago

I believe I am understanding Shannon correctly. He is saying that we would need two accounts, one is readonly, which we have already, and one is writeonly with permission for only the special collections. I think it makes sense, when we catch our breath, to go ahead and make that write only account and use it in the scorecard. We can call the special user "scorecard" or something. Then when we are ready for the switch we have to have the three cluster machines moved into the DMZ. Unless someone wants to pay for an AWS cluster, which would also work. randy

On Thu, Mar 9, 2023 at 4:40 PM Molly Smith @.***> wrote:

We would need to write to both the SCORECARD collection and the SCORECARD_SETTINGS collection, but then sure!

On Thu, Mar 9, 2023 at 3:47 PM randytpierce @.***> wrote:

I spoke with Shannon today about this problem. Shannon said that if we made a readonly user that did the reading and a special write only user that could ONLY write the SCORECARD collection, nothing else, then we could probably put the CB cluster and the vxDataProcessor into the DMZ. I can see how that would work.

— Reply to this email directly, view it on GitHub < https://github.com/NOAA-GSL/vxDataProcessor/issues/18#issuecomment-1462931086 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AHBWGZP23V5IOA7GPL5XQRDW3JMXBANCNFSM6AAAAAAVUJ336I

. You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/NOAA-GSL/vxDataProcessor/issues/18#issuecomment-1462998407, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDVQPX4XESYNO7Q4Y4B3DTW3JS67ANCNFSM6AAAAAAVUJ336I . You are receiving this because you were assigned.Message ID: @.***>

-- Randy Pierce

randytpierce commented 1 year ago

The AWS cluster might even be better, actually, just expensive.

gopa-noaa commented 1 year ago

Just to be clear, we are saying scorecard app, vxDataProcessor and Couchbase cluster all will be in the DMZ, right ? Please pardon my lack of knowledge of production system, may be this issue has already been addressed, how would the cluster be populated with data from ingest ?

randytpierce commented 1 year ago

Good question. The standalone CB. server must always live on the internal network, and the ingest processing will always live on the internal network because the data for them are only available on the internal network. The data is replicated from the internal standalone CB to the public cluster. Once we have a public cluster all the apps, internal or not need to access the public cluster. In that case an internal app might have to either write the scorecard to the internal CB and let it get replicated to the cluster, or we may need to provide a service (perhaps on the data processor) that can save a scorecard to the cluster on behalf of the internal app.

On Fri, Mar 10, 2023 at 10:05 AM Gopa @.***> wrote:

Just to be clear, we are saying scorecard app, vxDataProcessor and Couchbase cluster all will be in the DMZ, right ? Please pardon my lack of knowledge of production system, may be this issue has already been addressed, how would the cluster be populated with data from ingest ?

— Reply to this email directly, view it on GitHub https://github.com/NOAA-GSL/vxDataProcessor/issues/18#issuecomment-1464106136, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDVQPS527W3CLZMSR34Z6TW3NNOPANCNFSM6AAAAAAVUJ336I . You are receiving this because you were assigned.Message ID: @.***>

-- Randy Pierce

ian-noaa commented 1 year ago

During the dev meeting yesterday there was some uncertainty around if we would need to handle replicating scorecardDocument's from the database in the DMZ to the internal database. (and vice-versa) The scorecardDocuments are thus far the only part of the system that MATS needs write access for.

We could simply let the scorecard documents differ from the dev systems to the production system. This may be ideal as it would encourage production traffic to the production instance while keeping the internal systems more open for development.

The rest of the data would be replicated from the internal database cluster into production.

Our discussion yesterday covered the other components of the application - ingest would remain internal and write to an internal couchbase, we would need VMs for MATS to run on, a large-ish VM for the scorecard data processor to run on, and could either buy more physical hardware to deploy couchbase on in the DMZ or try running couchbase on top of VMs in the DMZ.

@bonnystrong wanted to track this issue.

randytpierce commented 1 year ago

created task list