benoitc / couchbeam

Apache CouchDB client in Erlang
Other
242 stars 114 forks source link

put_attachment crashes with filename containing vowels #150

Open cwmichi opened 8 years ago

cwmichi commented 8 years ago

Hello,

when I am trying to upload a file with a vowel ü,ä,ö,ß etc. in a filenam the erlang module crashes.

See: exit {ucs,{bad_utf8_character_code}} STACK [{xmerl_ucs,from_utf8,1,[{file,"xmerl_ucs.erl"},{line,186}]},{couchbeam_util,encode_att_name,1,[{file,"couchbeam_util.erl"},{line,26}]},{couchbeam,put_attachment,5,[{file,"couchbeam.erl"},{line,878}]}

fenollp commented 8 years ago

What's the filename? Can you io:format("~w\n", [It]).?

cwmichi commented 8 years ago

Here are two examples for you directly from the erlang console:

1> It = "helloß.jpg". "helloß.jpg" 2> io:format("~w\n", [It]). [104,101,108,108,111,223,46,106,112,103] ok 3> It2 = "hello_üö.jpg". "hello_üö.jpg" 4> io:format("~w\n", [It2]). [104,101,108,108,111,95,252,246,46,106,112,103] ok

fenollp commented 8 years ago

This string looks fine to me.

1> xmerl_ucs:to_utf8("helloß.jpg").
[104,101,108,108,111,195,159,46,106,112,103]

Is this really the input that is passed to couchbeam_util:encode_att_name/1?

exit {ucs,{bad_utf8_character_code}}

It looks like a non-utf8 encoded binary was passed to couchbeam_util:encode_att_name/1 and that had xmerl_ucs barf. Per http://erlang.org/doc/apps/stdlib/unicode_usage.html#id62552 I suggest unicode:characters_to_binary

cwmichi commented 8 years ago

Yes, I passed to couchbeam_util:encode_att_name/1...

Because in this way it works.... I just look into unicode:characters_to_binary. It2 = <<"hello_üö.jpg"/utf8>>. xmerl_ucs:from_utf8(It2).

Thank you very much for your assistance and tips :-) I will try it with my debugger, you can close this issue I think. ;)

fenollp commented 8 years ago

Not sure about closing. Should this case be handled in couchbeam or by users? cc @benoitc @lazedo