bcgov / NRPTI

Natural Resources Public Transparency Initiative
Apache License 2.0
5 stars 15 forks source link

EMLI & EPD importers failing due to timeout #1246

Open sggerard opened 1 week ago

sggerard commented 1 week ago

Describe the Bug NRIS-EMLI & NRIS-EPD cron imports appear to be intermittently failing in production to due to timeout errors. "General Error Error: timeout of 2000ms exceeded". Unsure as to the cause of the error, investigation required.

Expected Behaviour Importer completes successfully

Actual Behaviour Fails due to unknown error

Implications Records from EMLI and EPD are not being updated into NRPTI.

acatchpole commented 1 week ago

The server that hosted the NRIS-EPD server (cocacola server) was down since 25-05-24. As of 30-05-24, the server seems to be back up and the import is running, though a status 'Failed' is still appearing and needs investigating. Could be related to the fact that this import action seems to be taking a very long time. (Each individual record seems to be taking 7-8 seconds to process).

acatchpole commented 1 week ago

For the EMLI import, during the period that the import was stopping with a timeout error ("General Error Error: timeout of 2000ms exceeded"), a manual query to the API returned the following message:

Image

As of 29-06-24, the error had changed to "General Error Error: read ECONNRESET". Given that the errors around this import seemed to be mirrored by the EPD import, it is likely that this service was also running on Cocacola.

acatchpole commented 1 week ago

As of 30-06-24, both importers seem to be running, but both report a status of Failed. And while the EPD importer is still reporting thousands of records imported, the EMLI one is only reporting 2. Further investigation is needed to see why the Failed error is appearing.

acatchpole commented 1 week ago

Despite some of the import functionality returning, this page does not have the cocacola server issue showing resolved. Further effort into resolving these issues should not be made until all issues with the NRISWS server have been resolved.

acatchpole commented 6 days ago

Updates to the outage page show that some of the affected apps have been restored, including NRIS Web Service (this was likely true yesterday, and i just missed it). As of now, the EPD importer is running and still processing records very slowly. I think it appropriate to still wait for all issues with the cocacola server to be resolved before further investigation.