bcgov / entity

ServiceBC Registry Team working on Legal Entities
Apache License 2.0
23 stars 59 forks source link

Email Delivery - Missing PDF Document_Incorporation #24321

Closed ArwenQin closed 1 day ago

ArwenQin commented 2 weeks ago

File an Incorporation, occasionally, the Certificate of Incorporation is not attached, though the email says it's attached. This bug is rare, out of 10 tests, it was missing twice.

Happened for a ltd:

image.png

Happened for a CCC:

image.png

https://test.business.bcregistry.gov.bc.ca/BC1147574/

AimeeGao commented 1 week ago

Investigation Notes:

Issue Description:

During the testing, we found an intermittent issue where PDF attachments were missing from incorporation emails. The issue occurs approximately 2 out of 10 times when incorporated business filings.

Investigation Process:

After investigating the logs, we found that these missing attachments are caused by 503 errors from the Report API service.

image.png image.png

After reaching out to Andriy, we confirmed several important points:

  1. The Report-API service in dev/test environments was running with 512 MiB memory allocation, which caused the 503 error as the REPORT API container did not have enough memory allocated. A typical incorporation certificate PDF is around 644KB.

    image.png
  2. The production environment has 8GB memory allocation, so this issue may not happen in the production environment.

  3. The REPORT API moved to GCP a few months ago, and the URL for dev is: https://report-api-dev-366678529892.northamerica-northeast1.run.app. While the old service in OpenShift is still running, the SRE team recommends that we transition to using the new GCP endpoint.

Troubleshooting Process:

After Andriy bumped up the memory from 500 MB to 2 GB in both dev and test environments, I ran about 30 test requests on the report API endpoint. During this testing, no 503 errors were encountered. It seems the memory increase has resolved the issue.

image.png

Next Steps:

  1. Configuration Updates: We need to confirm if we should update the 1password secrets in DEV and TEST to use the new GCP REPORT API URL.

  2. The intermittent missing attachments issue could be due to limited resources allocated to DEV and TEST. Should we solve it by increasing the memory or should we maintain the current configuration (512 MiB) in TEST/DEV to keep costs down? As more memory means more costs, maybe we should focus on the PROD env.

  3. Since PROD already has 8 GB, it might not be affected, but we can't be 100% sure. Perhaps we should monitor it and create a post-launch monitoring ticket to see if we encounter this issue in production.

@leodube-aot @vysakh-menon-aot @severinbeauvais What do you all think about these findings? We'd appreciate hearing your thoughts and suggestions on how we should address these issues.

eason-pan-bc commented 1 week ago

This is the same root cause of intermittently missing PDFs for voluntary dissolution documents (rare, but happened) #24318

And I also have 2 questions regarding the next steps:

  1. Do we wanna update the Secrets in DEV and/or TEST for REPORT API URL, to use GCP instead of OpenShift? (maybe a good idea?)
  2. Do we wanna increase the memory allocated to DEV/TEST (maybe not a good idea?), or just keep this in mind, and closely monitor issues after launch?

Same here, would like to know your thoughts 😃

vysakh-menon-aot commented 1 week ago

@AimeeGao Update to use report-api in GCP. While updating please make sure that there is no auth issues

Update 1Password and Openshift Secret (since updating secret with 1password is disabled in openshift. ref: #23222)

severinbeauvais commented 1 week ago

Fantastic investigation, Aimee! 👍

Thanks for your response, Vysakh.

AimeeGao commented 1 week ago

Thanks, Vysakh and Sev! Good points. For the first question, do you think it would be helpful if we post a message in the channel to let everyone know that we're planning to switch to the GCP Report API? As for the second question, maybe I can check with Andriy since Patrick is on vacation. Does that sound good?

severinbeauvais commented 1 week ago

For the first question, I think specific people have to be notified directly. Do you know who owns Report API?

For the second question, yes, sure.

AimeeGao commented 1 week ago

For the first question, I think specific people have to be notified directly. Do you know who owns Report API?

For the second question, yes, sure.

I’m not entirely sure who the owner of the Report API is. However, I know that the code comes from this repo: https://github.com/bcgov/bcros-common/tree/main/report-api. Maybe we could check with someone familiar with this repository to confirm ownership?

AimeeGao commented 1 week ago

The last question is whether we should increase the resource allocation for the DEV/TEST environment. Given that Andriy mentioned that DEV/TEST may not require a lot of resources, and given the high cost, do we see a need to increase the memory? Or should we keep the current 512 MiB configuration for now? ( This question is from Andriy )

severinbeauvais commented 1 week ago

These questions need to be answered by the project owners and whoever pays for the services. I thought that was Patrick, but you could also escalate through your PO.

@seeker25 Thoughts?

AimeeGao commented 1 week ago

For the first question, I think specific people have to be notified directly. Do you know who owns Report API?

For the second question, yes, sure.

I got some feedback from Andriy regarding the OCP Report API. He mentioned that there is still traffic going through it. shutting it down earlier would have an impact.

severinbeauvais commented 1 week ago

OK, let's leave this with Andriy for now. And please tag @pwei1018 as needed.

Also, let's update the keys so that Dev and Test use the GCP instance of Report API.

Everything else is above my pay grade, so cc: @davemck513 @OlgaPotiagalova

seeker25 commented 1 week ago

@severinbeauvais shouldn't be an issue raising the resources. I was talking to Patrick about this before he left.. we pay way more for SQL and storage than Cloudrun services at least for auth.

severinbeauvais commented 1 week ago

Thanks, Travis.

But, soon, we'll want to use Report API in GCP. Is it stable? Should we just change the 1Pass keys? Or should we first try to see who would be affect by changing the keys and then ensure they're ready for the change?

seeker25 commented 1 week ago

I think it is fairly stable, we're using it for receipt generation. Probably not hard to switch back if necessary? Patrick is back off vacation on the 18th/19th? Could always just wait for then?

severinbeauvais commented 1 week ago

Thanks again, Travis. Yes, just a couple of keys to change + redeploy.

@AimeeGao , it sounds like it would be OK to change it for Dev and Test right now and then park this ticket (or create a duplicate) for changing Prod later.

AimeeGao commented 1 week ago

Thanks again, Travis. Yes, just a couple of keys to change + redeploy.

@AimeeGao , it sounds like it would be OK to change it for Dev and Test right now and then park this ticket (or create a duplicate) for changing Prod later.

Thanks, Sev and Travis. I also got feedback from Dave (@davemck513 ), which aligns well with your suggestions.

So, we'll proceed as follows:

  1. Change the Report API OCP Secret for the legal api in Dev and Test first.
  2. Change the 1pass. Sev, could you help me update the 1Password config, as we discussed this morning?
  3. If everything goes smoothly, we'll create a new ticket for the prod change.
severinbeauvais commented 1 week ago
  1. Change the 1pass. Sev, could you help me update the 1Password config, as we discussed this morning?

It's changed for Dev only. Is there any way you can test this before I change Test?

image

PS - What's the new URL for Test?

AimeeGao commented 1 week ago

Thanks for the update. I've also updated the OCP Secret in Dev. I'm currently testing the changes by calling the API to verify if there are any issues with the Dev changes.

As for the Test URL, I'm still in the process of confirming it. I'll update you as soon as I have that information.

AimeeGao commented 1 week ago

QA Notes:

We’ve confirmed that the latest GCP configuration is in place for both Dev and Test environments, and no other changes were needed. After scaling up resources, I ran tests in both environments:

  1. 20 requests in TEST, 10 requests in DEV.
  2. Checked the logs and verified the email attachments in MailHog(DEV) and Email(TEST).

Everything looks good, no 503 errors came up during testing, and everything seems to be running smoothly.

severinbeauvais commented 5 days ago

Does anything need to be done for Prod right now?

Does anything need to be done for Prod later? (When?)

AimeeGao commented 4 days ago

Thanks, Sev for pointing this out 👍

Our current Prod configuration is still using the old OCP setup. Here's what we’ll need to do:

  1. Update Legal API's Prod OCP Secret.
  2. Check if the 1Password configuration for Prod is up-to-date. If not, we’ll need to update it to use the new GCP setup.

It might be a good idea to make these changes after we've confirmed everything is working smoothly in Dev/Test, and then monitor the performance in Prod for a while.

I’ve created a ticket to track these updates. As for the timing, @vysakh-menon-aot, when you have a moment, could you help confirm the timing for when we should do the Prod changes? Thanks.

vishnup0422 commented 4 days ago

Test env:

Created a new BEN business with Filing ID: 395939 BEN: BC1152926

Ran 20 requests against GET certificate endpoint

image.png

Made 30 requests against Incorporation Application endpoint.

image.png

Made 30 requests to fetch Receipt.

image.png

Notice of Articles

image.png

Dev Env

Business: BC0883763 FilingId: 152549

Made 30 requests against Incorporation Application endpoint.

image.png

Certificate

image.png

Receipt:

image.png

Notice of Articles

image.png