Closed Awlexus closed 3 months ago
Looks like a bug, we will have a look after OTP-27 release. As it is a bug we can of course schedule a fix to eventually also be released in 26 and maybe 25 piggybacked on some other fixes.
@Awlexus Can you reproduce the problem? I think I have a possible solution, but it is only by dry running the code in my head to try reproduce the symptoms you reported. So of you could check if it solves your problem that would be great.
@Awlexus ping
@IngelaAndin, Sorry for the delayed response. I'm unfortunatelly still able to reproduce this issue with OTP 27 and Elixir 1.17.1
Erlang/OTP 27 [erts-15.0] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit:ns]
Interactive Elixir (1.17.1) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> # Pasting a small test module
iex(2)> Application.ensure_all_started(:ftp)
{:ok, [:ftp]}
iex(3)> {:ok, conn} = FTP.connect()
{:ok, #PID<0.118.0>}
iex(4)> :ftp.recv_bin conn, 'webcams/sorted/kro/cam5/latest.webp'
{:ok, <<...>>}
iex(5)> :ftp.recv_bin conn, 'webcams/sorted/kro/cam5/latest.webp'
:ok
Update: My bad, I didn't notice that the PR is still open. I'll try to compile your PR
This time I compiled it from scratch and tried to ran it in the Eshell
$ ~/p/g/otp (ingela/ftp/GH-8454/OTP-19119)> bin/erl
Erlang/OTP 27 [erts-15.0] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit:ns]
Eshell V15.0 (press Ctrl+G to abort, type help(). for help)
1> Host = "redacted".
2> User = "redacted".
3> Password = "redacted".
4> Port = redacted.
5> {ok, _} = application:ensure_all_started(ftp).
6> {ok, Pid} = ftp:open(Host, [{port, Port}]).
7> ok = ftp:user(Pid, User, Password).
8> ftp:recv_bin(Pid, "webcams/sorted/kro/cam5/latest.webp").
{ok,<<82,73,70,70,128,145,2,0,87,69,66,80,86,80,56,32,
116,145,2,0,240,92,14,157,1,42,152,...>>}
9> ftp:recv_bin(Pid, "webcams/sorted/kro/cam5/latest.webp").
=INFO REPORT==== 20-Jun-2024::00:43:14.866124 ===
ftp : <0.95.0> : Unexpected message: {tcp_closed,#Port<0.7>}
State: {state,{tcp,#Port<0.5>},
undefined,undefined,false,"/hdd/programming/github/otp",
ftp_server_default,false,passive,60000,<<>>,
{<<>>,[],start},
"226 Operation successful\r\n",<0.85.0>,
{<0.85.0>,#Ref<0.2383637922.1713373186.155942>},
{recv_bin,<<large binary, likely same as in line 8>>},
inet,[],[],[],ignore,infinity,false,false,false}
@Awlexus thanks for the input, so we got a new behavior. I will try to analyze it further. I do wish we had gen_statem when implementing ftp client it would have made state handling easier. Probably this close message is ok and we just need postpone the handling of it a little.
@Awlexus I added a new commit to my PR can you try it out?
@IngelaAndin Sorry for the late reply and thank you for your work. I ran it a script that downloads a file a couple of times and it never returned just ok
:confetti_ball:
I am unsure whether this has to do with the connection to the FTP server or not, but it's worth mentioning that the script did not always finish, sometimes it would get stuck indefinitely while downloading the file. I will test this again from another network to be sure
Ok, keep me posted. I am on vacation for a couple of weeks now. In worst case there is some new race, it is a little tricky to keep track of all the messages one the two tcp-connections that ftp protocol sets up that have dependancies on each other but can arrive in arbitrary order. Especially without a good state machine behavior. But gen_statem was not invented when the ftp client was written.
@Awlexus back at work, have you had time to try? If it is not working I push a potential new commit that I think theoretical could affect the intricate state machine handling, that you could try out.
Hi, sorry for the late reply. I tried it with 2 network environments, but it would still freeze the Eshell after ~2-4 tries
@Awlexus ok please make sure to get sha beb2d37329ad647265411b8b589062c51c6926ca, I managed to push a bad commit first.
Hi, I got the commit, but it didn't seem to resolve the issue. I ran make clean
and built it again to be sure.
@Awlexus maybe my "informed guess luck" is running out. Could you produce an erlang crash-dump when it hangs? erlang:halt("ftp_freeze").
I created a crash dump. I would prefer sending it to you privately, because it contains some credentials. Is the email in your commit alright?
Yes, it is.
@Awlexus ok looking at the crashdump
I got a new suspicion on what might be going wrong. I have updated the PR please try again.
I believe this fixed it. I tried 30+ times without encountering the freeze
Describe the bug As the title says, sometimes
ftp:recv_bin/2
returnsok
for me instead of{ok, Content}
To Reproduce This is how I connect to the ftp server and send the commands. I'm using elixir, because I'm not used to programming in erlang, sorry for the inconvenience.
Expected behavior
ftp:recv_bin/2
should always return a status tuple like{ok, Content}
or{:error, Reason}
as documentedAffected versions Erlang/OTP 26.2.3 with Elixir 1.16.2
Prompt from starting IEx