erlang / otp

Erlang/OTP
http://erlang.org
Apache License 2.0
11.33k stars 2.94k forks source link

`ftp:recv_bin/2` sometimes returns `ok` #8454

Closed Awlexus closed 1 month ago

Awlexus commented 5 months ago

Describe the bug As the title says, sometimes ftp:recv_bin/2 returns ok for me instead of {ok, Content}

To Reproduce This is how I connect to the ftp server and send the commands. I'm using elixir, because I'm not used to programming in erlang, sorry for the inconvenience.

  # In the module I use for managing FTP connections
  def connect() do
    config = Application.get_env(:my_app, __MODULE__)
    host = config |> Keyword.get(:host) |> to_charlist()
    user = config |> Keyword.get(:user) |> to_charlist()
    password = config |> Keyword.get(:password) |> to_charlist()
    port = config |> Keyword.get(:port)

    with {:ok, pid} <- :ftp.open(host, port: port),
         :ok <- :ftp.user(pid, user, password) do
      {:ok, pid}
    end
  end
iex(1)> {:ok, conn} = Ftp.connect()
{:ok, #PID<0.532.0>}
iex(2)> :ftp.recv_bin conn, 'webcams/sorted/kro/cam5/latest.webp'
{:ok, <<...>>}
iex(3)> :ftp.recv_bin conn, 'webcams/sorted/kro/cam5/latest.webp'
:ok

Expected behavior ftp:recv_bin/2 should always return a status tuple like {ok, Content} or {:error, Reason} as documented

Affected versions Erlang/OTP 26.2.3 with Elixir 1.16.2

Prompt from starting IEx

Erlang/OTP 26 [erts-14.2.3] [source] [64-bit] [smp:6:6] [ds:6:6:10] [async-threads:1]
Interactive Elixir (1.16.2) - press Ctrl+C to exit (type h() ENTER for help)
IngelaAndin commented 5 months ago

Looks like a bug, we will have a look after OTP-27 release. As it is a bug we can of course schedule a fix to eventually also be released in 26 and maybe 25 piggybacked on some other fixes.

IngelaAndin commented 4 months ago

@Awlexus Can you reproduce the problem? I think I have a possible solution, but it is only by dry running the code in my head to try reproduce the symptoms you reported. So of you could check if it solves your problem that would be great.

IngelaAndin commented 3 months ago

@Awlexus ping

Awlexus commented 3 months ago

@IngelaAndin, Sorry for the delayed response. I'm unfortunatelly still able to reproduce this issue with OTP 27 and Elixir 1.17.1

Erlang/OTP 27 [erts-15.0] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit:ns]

Interactive Elixir (1.17.1) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> # Pasting a small test module
iex(2)> Application.ensure_all_started(:ftp)
{:ok, [:ftp]}
iex(3)> {:ok, conn} = FTP.connect()
{:ok, #PID<0.118.0>}
iex(4)> :ftp.recv_bin conn, 'webcams/sorted/kro/cam5/latest.webp'
{:ok, <<...>>}
iex(5)> :ftp.recv_bin conn, 'webcams/sorted/kro/cam5/latest.webp'
:ok

Update: My bad, I didn't notice that the PR is still open. I'll try to compile your PR

Awlexus commented 3 months ago

This time I compiled it from scratch and tried to ran it in the Eshell

$ ~/p/g/otp (ingela/ftp/GH-8454/OTP-19119)> bin/erl    
Erlang/OTP 27 [erts-15.0] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit:ns]

Eshell V15.0 (press Ctrl+G to abort, type help(). for help)
1> Host = "redacted".
2> User = "redacted".
3> Password = "redacted".
4> Port = redacted.
5> {ok, _} = application:ensure_all_started(ftp).
6> {ok, Pid} = ftp:open(Host, [{port, Port}]).
7> ok = ftp:user(Pid, User, Password).
8> ftp:recv_bin(Pid, "webcams/sorted/kro/cam5/latest.webp").
{ok,<<82,73,70,70,128,145,2,0,87,69,66,80,86,80,56,32,
      116,145,2,0,240,92,14,157,1,42,152,...>>}
9> ftp:recv_bin(Pid, "webcams/sorted/kro/cam5/latest.webp").
=INFO REPORT==== 20-Jun-2024::00:43:14.866124 ===
ftp : <0.95.0> : Unexpected message: {tcp_closed,#Port<0.7>}
State: {state,{tcp,#Port<0.5>},
              undefined,undefined,false,"/hdd/programming/github/otp",
              ftp_server_default,false,passive,60000,<<>>,
              {<<>>,[],start},
              "226 Operation successful\r\n",<0.85.0>,
              {<0.85.0>,#Ref<0.2383637922.1713373186.155942>},
              {recv_bin,<<large binary, likely same as in line 8>>},
              inet,[],[],[],ignore,infinity,false,false,false}
IngelaAndin commented 3 months ago

@Awlexus thanks for the input, so we got a new behavior. I will try to analyze it further. I do wish we had gen_statem when implementing ftp client it would have made state handling easier. Probably this close message is ok and we just need postpone the handling of it a little.

IngelaAndin commented 2 months ago

@Awlexus I added a new commit to my PR can you try it out?

Awlexus commented 2 months ago

@IngelaAndin Sorry for the late reply and thank you for your work. I ran it a script that downloads a file a couple of times and it never returned just ok :confetti_ball:

I am unsure whether this has to do with the connection to the FTP server or not, but it's worth mentioning that the script did not always finish, sometimes it would get stuck indefinitely while downloading the file. I will test this again from another network to be sure

IngelaAndin commented 2 months ago

Ok, keep me posted. I am on vacation for a couple of weeks now. In worst case there is some new race, it is a little tricky to keep track of all the messages one the two tcp-connections that ftp protocol sets up that have dependancies on each other but can arrive in arbitrary order. Especially without a good state machine behavior. But gen_statem was not invented when the ftp client was written.

IngelaAndin commented 2 months ago

@Awlexus back at work, have you had time to try? If it is not working I push a potential new commit that I think theoretical could affect the intricate state machine handling, that you could try out.

Awlexus commented 2 months ago

Hi, sorry for the late reply. I tried it with 2 network environments, but it would still freeze the Eshell after ~2-4 tries

IngelaAndin commented 2 months ago

@Awlexus ok please make sure to get sha beb2d37329ad647265411b8b589062c51c6926ca, I managed to push a bad commit first.

Awlexus commented 2 months ago

Hi, I got the commit, but it didn't seem to resolve the issue. I ran make clean and built it again to be sure.

IngelaAndin commented 2 months ago

@Awlexus maybe my "informed guess luck" is running out. Could you produce an erlang crash-dump when it hangs? erlang:halt("ftp_freeze").

Awlexus commented 2 months ago

I created a crash dump. I would prefer sending it to you privately, because it contains some credentials. Is the email in your commit alright?

IngelaAndin commented 1 month ago

Yes, it is.

IngelaAndin commented 1 month ago

@Awlexus ok looking at the crashdump I got a new suspicion on what might be going wrong. I have updated the PR please try again.

Awlexus commented 1 month ago

I believe this fixed it. I tried 30+ times without encountering the freeze