benoitc / erlang-idna

Erlang IDNA lib
MIT License
43 stars 29 forks source link

random :bad_label error with ulabel error undef #26

Closed dalhorinek closed 5 years ago

dalhorinek commented 5 years ago

I'm using hackney library (wrapped in HTTPoison) for fetching pages in my Elixir project.

Sometimes after some code change this error happens, this interesting part is when I stop it and/or run it again, it works. Or mix deps.compile helps

** (stop) {:bad_label, {:alabel, 'The label "kolkovna"  is not a valid A-label: ulabel error=undef'}}
    (idna) ../erlang-idna/src/idna.erl:278: :idna.alabel/1
    (idna) ../erlang-idna/src/idna.erl:145: :idna.encode_1/2
    (hackney) ../hackney/src/hackney_url.erl:100: :hackney_url.normalize/2
    (hackney) ../hackney/src/hackney.erl:306: :hackney.request/5
    lib/httpoison/base.ex:746: HTTPoison.Base.request/6
    (crawler) lib/crawler/fetcher.ex:40: Crawler.Fetcher.fetch/2
    (crawler) lib/crawler/fetcher.ex:24: anonymous fn/4 in Crawler.Fetcher.handle_events/3
    (elixir) lib/enum.ex:1925: Enum."-reduce/3-lists^foldl/2-0-"/3
Last message: {:"$gen_consumer", {#PID<0.504.0>, #Reference<0.1499611603.472645636.228450>}, ["http://kolkovna.cz"]}
State: {#PID<0.503.0>, []}

I didn't know how to debug it, so I cloned the erlang-idna repository and used it as local depenency and added some io prints.

(in idna.erl:271)

alabel(Label0) ->
  io:put_chars(standard_error, "alabel(" ++ Label0 ++ ")\n"),
  Label = case lists:all(fun(C) -> idna_ucs:is_ascii(C) end, Label0) of
  ...

(in idna.erl:299)

ulabel(Label0) ->
  io:put_chars(standard_error, "ulabel(" ++ Label0 ++ ")\n"),
  Label = case lists:all(fun(C) -> idna_ucs:is_ascii(C) end, Label0) of

(in hackney_url.erl:92)

                       Host1 = unicode:characters_to_list(
                                 urldecode(unicode:characters_to_binary(Host0))
                                ),
                       io:put_chars(standard_error, "idna:to_ascii(" ++ Host1 ++ ")\n"),

And this is the output when the error occurred.

idna:to_ascii(kolkovna.cz)
alabel(kolkovna)
ulabel(kolkovna)

The problem is caused by the ulabel function and it throws the error which is undef, I don't know why. I'd like to debug it more, but I'm not sure what to look for in the ulabel function The original request was http://kolkovna.cz

benoitc commented 5 years ago

hrm can you give me the full link you're trying?

I tried the following encoding:

1> idna:to_ascii("kolkovna.cz").
"kolkovna.cz"
2> 

without any error :/

dalhorinek commented 5 years ago

It usually works, only sometimes this error is thrown ... Mainly after some code change and recompilation. I added some more logging messages and it happens somewhere in the check_label.

I tried to create some minimal project to reproduce it and it happened again, however I'm not able to reproduce it, because when I run a script again after this error, it disappears.

Last error was like this

idna:to_ascii(www.smartwalls.cz)
alabel(www)ulabel(www)
lowercase(Label0 = www))
check_label(www)
** (exit) {:bad_label, {:alabel, 'The label "www"  is not a valid A-label: ulabel error=undef'}}
    (idna) .../erlang-idna/src/idna.erl:278: :idna.alabel/1
    (idna) .../erlang-idna/src/idna.erl:145: :idna.encode_1/2
    (hackney) .../hackney/src/hackney_url.erl:100: :hackney_url.normalize/2
    (hackney) .../hackney/src/hackney.erl:306: :hackney.request/5
    (httpoison) lib/httpoison/base.ex:746: HTTPoison.Base.request/6
    (title_fetch) lib/title_fetch.ex:9: TitleFetch.fetch/1

After that I added some more debug prints and waiting for the error to occur again ... I think that possible issue can be in the check_nfs

                  catch
                    _:Error ->
                      ErrorMsg = error_msg("The label ~p  is not a valid A-label: ulabel error=~p", [Label0, Error]),
                      erlang:exit({bad_label, {alabel, ErrorMsg}})
                  end,

It seems that there is some throw error. I found that exit reason undef is this

undef | The function cannot be found when evaluating a function call.

It happens sometimes after code update that sometimes hackney module is not available, also sometimes it throws that hackney.process function is not defined, so there is something going wrong with compilation. So this could be probably some elixir error.

benoitc commented 5 years ago

this is odd. if you can can you provide me a compilation log?

dalhorinek commented 5 years ago

Hello, I've tried to search where or how I can get it, but I haven't found how to force mix to provide any debug compilation messages, at least I tried to remove the try catch block and I got this error

mix run ./scripts/run.exs http://kolkovna.cz
===> Compiling idna
===> Compiling hackney
Compiling 1 file (.ex)
warning: variable "links" is unused
  lib/title_fetch.ex:43

warning: variable "response" is unused
  lib/title_fetch.ex:36

http://kolkovna.cz
idna:to_ascii(kolkovna.cz)
alabel(kolkovna)ulabel(kolkovna)
lowercase(Label0 = kolkovna))
check_label(kolkovna)
check_nfc(kolkovna)
check_hyphenkolkovna)
check_initial_combiner(kolkovna)** (UndefinedFunctionError) function :idna_data.lookup/1 is undefined (module :idna_data is not available)
    (idna) :idna_data.lookup(107)
    (idna) .../idna.erl:184: :idna.check_initial_combiner/1
    (idna) .../idna.erl:256: :idna.check_label/4
    (idna) .../idna.erl:315: :idna.ulabel/1
    (idna) .../idna.erl:279: :idna.alabel/1
    (idna) ../idna.erl:145: :idna.encode_1/2
    (hackney) /Users/dal/Work/Analyticoo/hackney/src/hackney_url.erl:100: :hackney_url.normalize/2
    (hackney) ../hackney/src/hackney.erl:306: :hackney.request/5

I'm not sure if it's related, but seems similar to https://github.com/benoitc/hackney/issues/528 As that happens sometimes as well and randomly (at least to me).

However I'm able to reproduce this erorr now always on my machine. When I delete some part of a text in IO.puts, this happens. I have created small project to test, maybe I can share it ... But this seems to be mix issue, because when I run the script again (with --no-compile), then the problem doesn't occur

benoitc commented 5 years ago

@dalhorinek i would be interrested by this minimal project. One thing I also would like to test if it's only happening with certifi, hackney or aldo when only using idna

dalhorinek commented 5 years ago

Here is small sample I created where I'm able to reproduce it...

https://github.com/dalhorinek/title_fetch

However I'm able to reproduce it always only when idna and hackney are used locally in deps.

  defp deps do
    [
      {:idna, "~> 6.0.0"},
      {:hackney, "~> 1.14.3"},
      {:httpoison, "~> 1.4"},
      {:floki, "~> 0.20.0"}
    ]
  end

When I have it like this, it happens only sometimes ... Usually after some heavy code change and multiple recompiles.

mtarnovan commented 5 years ago

Maybe related:

This used to work with 5.1:

iex(6)> :idna.to_ascii("foo@bar.com")  
** (exit) {:bad_label, {:alabel, 'The label "foo@bar"  is not a valid A-label: ulabel error={bad_label,\n                                                           {context,\n                                                            "Codepoint 64 not allowed (\'DISALLOWED\') at posion 3 in \\"foo@bar\\""}}'}}

was this an intentional change?

whysoserious commented 5 years ago

This also fails in v6.0.0:

iiex(9)> :idna.to_ascii("domain_with_underscore")
** (exit) {:bad_label, {:alabel, 'The label "domain_with_underscore"  is not a valid A-label: ulabel error={bad_label,\n                                                                          {context,\n                                                                           "Codepoint 95 not allowed (\'DISALLOWED\') at posion 6 in \\"domain_with_underscore\\""}}'}}
    /home/xd/Dev/rezolve/rezolve_master/deps/idna/src/idna.erl:277: :idna.alabel/1
    /home/xd/Dev/rezolve/rezolve_master/deps/idna/src/idna.erl:145: :idna.encode_1/2
benoitc commented 5 years ago

@mtarnovan afaik "foo@bar.com is not a valid domain name. You should skip the @ part.

@whysoserious see my answer in benoitc/hackney#557

mtarnovan commented 5 years ago

@benoitc You are right, of course. My question was if this was an intentional change in behaviour from 5.1, as with that version this does not fail.

benoitc commented 5 years ago

@mtarnovan i mean the change is intentional, it was working because it was first checking if all of the part was in ascii and if yes didn't try to encode it (also no rule enforcing was done).

@whysoserious you error is however different. Aparrently curl is accepting such hostname. I am checking why.

benoitc commented 5 years ago

Thanks for the feedback all. Closing the issue as WON'T FIX as it's the expected behaviour with the latest version.

In hackney such url is handled in https://github.com/benoitc/hackney/commit/8bb07a048bdeb8a6ab319765dd27cfd08c0fab1e

which bypass IDNA encoding. Hope it will helps. If the issue still bug you, please create a new one, or comment here.