bcgov / NRPTI

Natural Resources Public Transparency Initiative
Apache License 2.0
5 stars 14 forks source link

EMLI & EPD importers failing due to timeout #1246

Closed sggerard closed 3 months ago

sggerard commented 4 months ago

Describe the Bug NRIS-EMLI & NRIS-EPD cron imports appear to be intermittently failing in production to due to timeout errors. "General Error Error: timeout of 2000ms exceeded". Unsure as to the cause of the error, investigation required.

Expected Behaviour Importer completes successfully

Actual Behaviour Fails due to unknown error

Implications Records from EMLI and EPD are not being updated into NRPTI.

acatchpole commented 4 months ago

The server that hosted the NRIS-EPD server (cocacola server) was down since 25-05-24. As of 30-05-24, the server seems to be back up and the import is running, though a status 'Failed' is still appearing and needs investigating. Could be related to the fact that this import action seems to be taking a very long time. (Each individual record seems to be taking 7-8 seconds to process).

acatchpole commented 4 months ago

For the EMLI import, during the period that the import was stopping with a timeout error ("General Error Error: timeout of 2000ms exceeded"), a manual query to the API returned the following message:

Image

As of 29-06-24, the error had changed to "General Error Error: read ECONNRESET". Given that the errors around this import seemed to be mirrored by the EPD import, it is likely that this service was also running on Cocacola.

acatchpole commented 4 months ago

As of 30-06-24, both importers seem to be running, but both report a status of Failed. And while the EPD importer is still reporting thousands of records imported, the EMLI one is only reporting 2. Further investigation is needed to see why the Failed error is appearing.

acatchpole commented 4 months ago

Despite some of the import functionality returning, this page does not have the cocacola server issue showing resolved. Further effort into resolving these issues should not be made until all issues with the NRISWS server have been resolved.

acatchpole commented 4 months ago

Updates to the outage page show that some of the affected apps have been restored, including NRIS Web Service (this was likely true yesterday, and i just missed it). As of now, the EPD importer is running and still processing records very slowly. I think it appropriate to still wait for all issues with the cocacola server to be resolved before further investigation.

Keegnan commented 4 months ago

The integration needs to be looked into further before fixing. The move from the Cocacola server was unplanned failure. The new infrastructure is causing issues for us to work. Need to look into limitation and how to resolve the issue with this server.

Keegnan commented 4 months ago

Will need to reconfigure this ticket or rewrite.