Closed ividito closed 3 weeks ago
This PR stacks additional changes on top of the vector ingests, which we're trying to get working in #198. This PR is ready for review, but let's hold off on merging until that PR is merged to main.
Testing plan: re-deploy to SIT and test ingestion
This PR adds a new generic vector dag. Wondering if new updates are needed here to also support the same pattern for that dag?
I added the generic vector dag to the new pattern and deployed yesterday. It hasn't been tested yet though for either generic or EIS vector.
I added the generic vector dag to the new pattern and deployed yesterday. It hasn't been tested yet though for either generic or EIS vector.
Oh that's great! Thanks!
@ranchodeluxe do we have a dev
or test features db that that we test an EIS ingest for these new airflow changes?
@ranchodeluxe do we have a
dev
or test features db that that we test an EIS ingest for these new airflow changes?
I literally just deleted it like an hour ago 😆 But if you need me to spin one back up I can do that
If it's not too much trouble could you spin it up? We need to test both the generic and EIS ingest with these changes
Yeah, will do it after this big demo meeting 👍
On Wed, Aug 28, 2024 at 12:43 PM Saadiq Mohiuddin @.***> wrote:
If it's not too much trouble could you spin it up? We need to test both the generic and EIS ingest with these changes
— Reply to this email directly, view it on GitHub https://github.com/NASA-IMPACT/veda-data-airflow/pull/197#issuecomment-2316121382, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABS7W364Y2ZMVI2JIUBHATZTYR6PAVCNFSM6AAAAABLYSYRRKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJWGEZDCMZYGI . You are receiving this because you were mentioned.Message ID: @.***>
-- Greg
We put some more work into testing the vector ingest, to make sure this won't break firelines once it gets to staging. To summarize:
veda-discover
in staging, we successfully ingested new data to that serviceSome changes and notes to make this work in the modified VPC environment:
Vector subnets and SG need to be the same as the RDS hosting features, and the security group needs an inbound rule accepting traffic from itself (this is a bit different on staging, which has a ton of manual SG changes)
This is not necessary. The vector subnets need to reference the private subnets in the vector VPC (shared base VPC in our case) but the SG should not be the RDS' - the MWAA variable should be using the terraform created ECS sg. The reason the SIT deployment wasn't working was because the rds sg didn't have an inbound rule for the ECS sg. The inbound rule is in IAC but may have been modified manually.
I've run a few ingests in the sit mwaa via the sit dataset/publish endpoint and the results look good. After we get the automated vector ingest I think we are ready to talk about merging this into dev--also might be time to pull in the upstream changes again :(.
Summary:
PR is deployed and tested on SIT.
Addresses #192 (and cleans out some tech debt).
Batches of discovered files are traceable through the full ingestion process, and failures can be isolated to individual batches rather than full ingestions.
Initial diagram showing (generally) the outline of the changes made:
New
discover
DAG visualization:Changes