addr-rs / addr

Parse domain names reliably and quickly in Rust
MIT License
51 stars 12 forks source link

Strict internationalized domain names (IDN) validation #13

Open marmeladema opened 2 years ago

marmeladema commented 2 years ago

Hello!

First allow me to thank you for your work :+1: That crate has been really useful and very simple to use!

I am not exactly sure if it's actually a goal of the crate but I figured I might ask. Should internationalized domain names be properly validated? I was looking at test cases from https://github.com/json-schema-org/JSON-Schema-Test-Suite/blob/master/tests/draft7/optional/format/idn-hostname.json and it seems that some domain names are accepted whereas they should probably be rejected.

A few examples:

$ dig 〮실례.테스트
dig: '〮실례.테스트' is not a legal IDNA2008 name (string contains a forbidden leading combining character), use +noidnin
$ dig 실〮례.테스트
dig: '실〮례.테스트' is not a legal IDNA2008 name (string contains a disallowed character), use +noidnin
$ dig xn--X
dig: 'xn--X' is not a legal IDNA2008 name (string contains invalid punycode data), use +noidnin

What do you think? Could the crate be enhanced to provide such domain validation? If not, do you recommend some alternatives?

Thank you for taking the time to read this.

rushmorem commented 2 years ago

Hello :)

First allow me to thank you for your work

It's my pleasure :slightly_smiling_face:

That crate has been really useful and very simple to use!

I'm glad to hear that. Thank you for the feedback!

I am not exactly sure if it's actually a goal of the crate but I figured I might ask. Should internationalized domain names be properly validated?

Yes, absolutely!

...it seems that some domain names are accepted whereas they should probably be rejected.

I thought that using this crate in conjunction with the idna crate would be able to cover all the cases. Turns out I was wrong. Thanks for bringing this to my attention. I have added those tests to this crate's integration tests and added this issue to the README.

marmeladema commented 2 years ago

Yes unfortunately idna does not seem to be fully compliant either. I hesitated to open an issue there too but it doesn't seem to be maintained that much nowadays. Moreover, I looked at the implementation of idna, and it's really tailored for converting an input string into either a ascii or unicode version of the domain, not really parsing and validation. Ideally I'd want a heap allocation free validator fully compliant with idna but I haven't been able to find one.

L020Isry8fuLjSL7r0Gmxw commented 2 years ago

Have you seen the stringprep crate? It claims to implement parsing and validation of IDN names defined by RFC 3491

marmeladema commented 2 years ago

I tried but it fails on the second test case of the file I mentioned so it doesn't seem compliant either.