Closed amberrignell closed 1 year ago
@amberrignell , tailwind component for this where employee rows are workorders
: https://tailwindui.com/components/application-ui/lists/tables#component-390db22a862687fe88b313692989ffaf
note that this requires some JS
Useful naming content:
Possible approach:
inserted_at
to determine the oldest of each. retry_attempt_run(attempt_run, user)
passing it an attempt run determined by 3 above.Why do we need the following tables?
For this assignment, should we change the invocation reason? (I am still not sure what invocation reason is , just inferred from the name)
Hey @sigu , thanks for the questions!
To give a quick answer,
We'll see you in standup tomorrow so Stu can give you more details then, hope this helps in the meantime!
Hey @amberrignell @taylordowns2000 , should the rerun from start
button get disabled / hidden when you don't have permission? Or should we allow the users to click around and get the notification error message
hey @midigofrank , who doesn't have permission? (have you created a policy for this action? sorry, i have not seen anything related to permissions in the issue.)
Hey @taylordowns2000 , I noticed there's a check for can_rerun_job
when rerunning a single job. I thought maybe we could use the same check for the bulk rerun
def handle_event(
"rerun",
%{"attempt_id" => attempt_id, "run_id" => run_id},
socket
) do
if socket.assigns.can_rerun_job do
attempt_id
|> AttemptService.get_for_rerun(run_id)
|> WorkOrderService.retry_attempt_run(socket.assigns.current_user)
{:noreply, socket}
else
{:noreply,
socket
|> put_flash(:error, "You are not authorized to perform this action.")}
end
end
Cool. For now please allow the error flash to appear as we don't have a design that explains why a user isn't able to run it. We can consider adding this later.
Alright. Perfect
Hey @taylordowns2000 , how would you suggest we handle failures. Let's say a single attempt fails, should we abort all the other attempts?
hi @midigofrank , great q 😄
No, the scope of this feature ends at Oban.insert
. In other words, this feature is to "queue up" a bunch of attempts (same thing that clicking "rerun" really fast on a bunch of attempts does) but once they're on the Oban queue they're in a totally different domain.
So.... sorry to be extra clear: if there is an error before you successfully enqueue those attempts with Oban.insert
(what might this be???) you could display a flash message explaining that something broke and asking the user to try again, but I don't want to do that since this would certainly be an unexpected error and I'd rather see it bubble up on Sentry then be able to debug the code. That make sense?
Hey @taylordowns2000 . That makes sense. Currently, I've wrapped all the actions (including the Oban.insert) for all the selected WorkOrders
to be retried in a single transaction. Assuming we have Work Orders A, B, C
, should an error occur while still retrying A
, B
and C
won't get processed at all. Should that be the case?
Sounds like we might need a quick call to nail this down—let's chat tomorrow! In the meantime:
When you say "while still retrying" do you mean "while you're attempting to add them to Oban" or "while they are running"? Assuming the first, I think what you actually want is the Oban.multi.
reducer = fn {changesets, index}, multi ->
Oban.insert_all(multi, "run_batch_#{index}", changesets)
end
runs
|> Enum.chunk_every(5_000)
|> Enum.with_index()
|> Enum.reduce(Ecto.Multi.new(), reducer)
|> Repo.transaction()
conn
|> put_status(:ok)
|> text(Enum.count(runs))
What we can chat about tomorrow is that if the user asks to enqueue 43,000 attempts, it would be nice to either have them all get enqueue or have none of them get enqueued. (We don't know and don't care if the new attempts succeed or fail as they are processed over the next several days... our job is DONE and we report back to the end user "All good... 43,000 enqueued!" as soon as they hit the queue.)
@midigofrank , next steps here please:
main
Oban.insert_all
so we're hitting the DB slightly less for a big bulk re-processing requestHey @taylordowns2000 , might you have an idea why the Mimic
library isn't getting loaded while testing? I keep getting the error below when I run mix test
:
** (UndefinedFunctionError) function Mimic.copy/1 is undefined (module Mimic is not available)
I haven't seen that problem pop up for anyone yet. Please push your most recent changes to your branch (assuming you've rebased, also check that box above!) and I'll fire it up.
Tests run OK for me as of this commit:
commit 04cdafe004f5aec7b19e7973166235640493f3b5 (HEAD -> 659-bulk-rerun-work-orders, origin/659-bulk-rerun-work-orders)
Author: Frank Midigo <midigofrank@gmail.com>
Date: Wed Jun 14 12:58:21 2023 +0300
sure thing. Just pushed
Eish, sorry @midigofrank , it looks ok on my machine and on circleCI. i'm not sure i'll be able to help.
What OS are you using? Have you seen this issue on other Elixir projects? Maybe try asdf install
again just to make sure? @sigu , any ideas on this one?
Hey @taylordowns2000 , I'm on M1 MacOs
Let me try it on a different OS
Hrmmm I'm on an M1 also.
@taylordowns2000 , I've been able to run the tests by setting MIX_ENV=test
. I thought mix test
does that for us 🤔
Yeah it does. Maybe you've got MIX_ENV set to something else by default in your path?
On Fri, Jun 16, 2023 at 10:06 AM Midigo Frank @.***> wrote:
@taylordowns2000 https://github.com/taylordowns2000 , I've been able to run the tests by setting MIX_ENV=test. I thought mix test does that for us 🤔
— Reply to this email directly, view it on GitHub https://github.com/OpenFn/Lightning/issues/659#issuecomment-1594365055, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCUBLK4X2OWNB3WFG4CZNDXLQOYRANCNFSM6AAAAAAVEN55UU . You are receiving this because you were mentioned.Message ID: @.***>
Aha, you're right. I made a copy of the .env.example
and MIX_ENV
has been set to dev
. Let me get rid of it.
Hey @taylordowns2000 , all tests are passing except for one I've added for the liveview page. This only happens when I have the Oban.insert(Pipeline Job)
code as part of the Multi
which does everything else.
could not checkout the connection owned by #PID<0.1124.0>
EDIT: Full stack trace
** (EXIT from #PID<0.1124.0>) an exception was raised:
** (RuntimeError) ChildProcess task exited without a value:
{%DBConnection.ConnectionError{message: "could not checkout the connection owned by #PID<0.1124.0>. When using the sandbox, connections are shared, so this may imply another process is using a connection. Reason: connection not available and request was dropped from queue after 510ms. You can configure how long requests wait in the queue using :queue_target and :queue_interval. See DBConnection.start_link/2 for more information", severity: :error, reason: :queue_timeout}, [{Ecto.Adapters.SQL, :raise_sql_call_error, 1, [file: 'lib/ecto/adapters/sql.ex', line: 913, error_info: %{module: Exception}]}, {Ecto.Repo.Schema, :apply, 4, [file: 'lib/ecto/repo/schema.ex', line: 756]}, {Ecto.Repo.Schema, :"-do_update/4-fun-3-", 15, [file: 'lib/ecto/repo/schema.ex', line: 459]}, {Lightning.Invocation, :update_run, 2, [file: 'lib/lightning/invocation.ex', line: 272]}, {Lightning.Pipeline.Runner.Handler, :"-start/2-fun-1-", 3, [file: 'lib/lightning/runtime/handler.ex', line: 74]}, {Task.Supervised, :invoke_mfa, 2, [file: 'lib/task/supervised.ex', line: 89]}, {Task.Supervised, :reply, 4, [file: 'lib/task/supervised.ex', line: 34]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 240]}]}
(lightning 0.6.3) lib/lightning/runtime/handler.ex:122: Lightning.Pipeline.Runner.Handler.wait/1
(lightning 0.6.3) lib/lightning/pipeline.ex:29: Lightning.Pipeline.process/1
Hey @taylordowns2000 , I found it easier to just move the implementation to "bulk mode" and leave out the Oban.insert operation from the main transaction. All tests are now passing.
great! please convert this to a full PR (it's a "draft" right now) and request a review from me and @stuartc ! we'll take a look tomorrow AM. do you know if @sigu is joining our final call tomorrow AM?
also, in prep for that call, have a look at the failed checks related to code coverage. thanks @midigofrank and speak soon
Got it. Given his current timezone, I'm not so sure. I'll confirm with him and let you know
User story
As a user that had a workflow which was running successfully, who then realised the logic isn't what I wanted, I would like to be able to bulk reprocess workflows, so that each workorder can be rerun with my updated logic.
Details
Allow users to select work orders from the history page Allow users to reprocess the work orders they have selected
If a user has selected some work orders, then changes the filters, their selection is lost.
Use the following modal component
A.Top-level select Message:
You've selected the number_of_workorders_on_page work orders from page age_number of results for workorders matching the following query search_params. There are a total of number_of_workorders_matching_query work orders matching this query.
Example: You've selected the 12 work orders from page 1 of results for workorders matching the following query: received between x and y date, for workflow A, with a status of 'Failed'. There are a total of 32 work orders matching this query.
Buttons: Cancel, Rerun all work orders on this page from start, Rerun all work orders matching the query.
B.Individual work orders selected Message:
You've selected number_of_selected_work_orders work orders to rerun from the start (first job). This will create a new attempt for these within the same work order
Example: You've selected 12 work orders to reprocess from the start (first job). This will create a new attempt for these within the same work order. Buttons: Cancel, Rerun from startImplementation notes
Please read through the platform-app implementation first Write a function that can take a list of work order IDs and create a new attempt for each one, in the order in which they have been passed to the function. Write a function that can take a query, and create a new attempt for each work order returned by the query, in the order in which they are returned.
The function rerun_run_from_start parameters would be: source: (list of IDs, or parameters) user: (used to add to the invocation reason)
Release notes
User acceptance criteria