data-dot-all / dataall

A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
https://data-dot-all.github.io/dataall/
Apache License 2.0
235 stars 82 forks source link

ECS task fails and crashes when RDS queries return error #1300

Open TejasRGitHub opened 6 months ago

TejasRGitHub commented 6 months ago

Is your idea related to a problem? Please describe. The way ECS verifier task has been coded, it fetches all the active shares and loops through each of them and then applies share verifier. Suppose there is failure as pointed out in the below screenshots,

image image

In this case, the share verifier crashes and exits out without completing verification for all the shares.

Describe the solution you'd like Add robustness by adding try except blocks on the top level and handle exceptions arising from the RDS querying. Find and check if there are any exceptions on which the share verifier should crash and stop and allow other exceptions to be logged and let the share verifier service run.

P.S. Don't attach files. Please, prefer add code snippets directly in the message body.

SofiaSazonova commented 6 months ago

Hi @TejasRGitHub ! I believe this issue is the same as #1266 . Thanks for the update! Currently it's in our ToDo-list.

TejasRGitHub commented 6 months ago

Hi @SofiaSazonova , thanks for pointing to this issue. Although that issue is specifically towards the share manager, I will add a comments and reference this issue. Please let me know or feel free to close this issue in favour or https://github.com/data-dot-all/dataall/issues/1266

noah-paige commented 4 months ago

I think this could be a quick implementation to add the additional robustness for the share verifier ECS task by wrapping each item processed in the loop in a try/except block

I am going to nominate this issue as a candidate for v2.7 separate from #1266 which I think details additional proposed changes