akash-akya / ex_cmd

ExCmd is an Elixir library to run external programs and to communicate with back pressure
MIT License
65 stars 3 forks source link

ExCmd (most probably) supressing errors and causes everything to hang #25

Closed Cinderella-Man closed 1 year ago

Cinderella-Man commented 1 year ago

Hello :wave:

I debugging an issue when my program hangs without raising/throwing any errors. I got it down to as small example as possible to ease reproducing.

So here is a simple function that uses Stream inside the try do .. rescue .. end clause:

  def start() do
      try do
        Stream.map(1..3, fn x -> x end)
        |> Stream.map(fn _x ->
          DateTime.from_unix!(12345)
        end)
        |> Stream.run()
      rescue
        err -> Logger.error(Exception.format(:error, err, __STACKTRACE__))
      end

      IO.inspect("finished")
  end

As expected, "finish" is printed to the console(I'm running this in a new project using mix run ./lib/buggy.ex)

Now when I tweak the value passed to the DateTime.from_unix!/1 to a string (it will cause an error):

  def start() do
      try do
        Stream.map(1..3, fn x -> x end)
        |> Stream.map(fn _x ->
          DateTime.from_unix!("12345")
        end)
        |> Stream.run()
      rescue
        err -> Logger.error(Exception.format(:error, err, __STACKTRACE__))
      end

      IO.inspect("finished")
  end

I'm getting the error caught(by the try) and logged to the console as well as "finished".

Now, I want to use the ExCmd in place of the hardcoded range:

  def start() do
    try do
      ExCmd.stream!(["echo", "1\n2\n3\n"])
      |> Stream.map(fn _x ->
        DateTime.from_unix!("1223")
      end)
      |> Stream.run()
    rescue
      err -> Logger.error(Exception.format(:error, err, __STACKTRACE__))
    end

    IO.inspect("finished")
  end

This is where things get weird - I'm not getting anything logged. The code hangs without catching the error, and it's not printing "finished". The only output I see is:

$ mix run ./lib/buggy.ex
Compiling 1 file (.ex)

12:23:47.680 [debug] Command started. os pid: 53759

Just to make sure - this will work fine if there's no error thrown/raised:

  def start() do
    try do
      ExCmd.stream!(["echo", "1\n2\n3\n"])
      |> Stream.map(fn _x ->
        DateTime.from_unix!(1223)
      end)
      |> Stream.run()
    rescue
      err -> Logger.error(Exception.format(:error, err, __STACKTRACE__))
    end

    IO.inspect("finished")
  end

"finished" is getting printed.


Can you reproduce this behaviour? Is this expected? Is my analysis correct?

I would really appreciate any feedback, as I've been scratching my head for a couple of days now :wink:

akash-akya commented 1 year ago

Hi @Cinderella-Man, thanks for raising the issue.

This might be a bug, I have to take a close look. But if you know the maximum time the program might take to exit, you can just set exit_timeout (ex: exit_timeout: 1_000 for 1s), default is infinity.

Cinderella-Man commented 1 year ago

Hello @akash-akya

Thanks for replying and for your suggestion - sadly, I'm dealing with long-running processes (5-10 minutes), which can die at any moment without any errors(because of the bug that I raised), so the exit_timeout probably won't do the trick

Please keep me in the loop :pray:

akash-akya commented 1 year ago

I see, understood, i'll push a fix soon. It would be helpful if you share the exact program you are running. Because the issue seems to be related to how the program flush its output.

akash-akya commented 1 year ago

I re read your recent comment. the exit_timeout is for how long we should wait for command to exit after we write all input or when we halt due to exception, not how long your overall pipeline going to take. So you can still use it

Cinderella-Man commented 1 year ago

I see, understood, i'll push a fix soon. It would be helpful if you share the exact program you are running. Because the issue seems to be related to how the program flush its output.

Here are the steps that we are using - aka how to reproduce:

$ mix new buggy

add ExCmd as a dependency.

Now update the /lib/buggy.ex to the following:

defmodule Buggy do
  require Logger

  def start() do
    try do
      parse_url("https://data.ris.ripe.net/rrc00/2023.05/bview.20230531.0000.gz")
    rescue
      err -> Logger.error(Exception.format(:error, err, __STACKTRACE__))
    end

    IO.inspect("finished")
  end

  def parse_url(url) do
    ExCmd.stream!(["bgpkit-parser", url])
    |> Stream.map(&decode_lines/1)
    |> Stream.map(&store_records/1)
    |> Stream.run()
  end

  def decode_lines(lines) do
    lines
    |> String.split("\n", trim: true)
    |> Enum.map(&decode_line/1)
  end

  def decode_line(line) do
    [
      type,
      msg_timestamp,
      peer_ip,
      peer_asn,
      prefix,
      as_path,
      origin,
      next_hop,
      local_pref,
      med,
      communities,
      atomic,
      aggr_asn,
      aggr_ip
    ] = String.split(line, "|")

    %{
      type: type,
      msg_timestamp: msg_timestamp,
      peer_ip: peer_ip,
      peer_asn: peer_asn,
      prefix: prefix,
      as_path: as_path,
      origin: origin,
      next_hop: next_hop,
      local_pref: local_pref,
      med: med,
      communities: communities,
      atomic: atomic,
      aggr_asn: aggr_asn,
      aggr_ip: aggr_ip
    }
  end

  def store_records(rows) do
    rows
    |> Enum.map(&IO.inspect/1)
  end
end

Buggy.start()

From the main project directory, this can be run using the following:

$ mix run ./lib/buggy.ex

There will be tons of output as I skipped the "storing records" step to avoid bringing the whole ecto here.

In a nutshell, we are doing the following:

Updating the line:

      msg_timestamp: msg_timestamp,

to:

      msg_timestamp: DateTime.from_unix!(msg_timestamp),

will cause the code to hang(as the msg_timestamp is a string - it will cause an error)

akash-akya commented 1 year ago

@Cinderella-Man, I made a push which should fix this issue, is it possible to test this branch?

  defp deps do
    [
      {:ex_cmd, github: "akash-akya/ex_cmd", branch: "dev"}
    ]
  end
Cinderella-Man commented 1 year ago

I tried to run the code from that branch (I appreciate the snippet to load the branch from repo as a dependency :wink: )

It appears to be working great - try-catch is now catching the error :sunglasses: Awesome work!

akash-akya commented 1 year ago

Cool, thanks for quick test! ๐Ÿ‘๐Ÿผ
I'll merge and merge and a cut a release in few days along with few other changes.

akash-akya commented 1 year ago

@Cinderella-Man released v0.10.0 with the fix and few other changes ๐ŸŽ‰