Charcoal-SE / metasmoke

Web dashboard for SmokeDetector.
https://metasmoke.erwaysoftware.com
Creative Commons Zero v1.0 Universal
43 stars 34 forks source link

Whois field cannot accommodate Unicode, causes traceback #255

Closed tripleee closed 6 years ago

tripleee commented 7 years ago

I tried to post information for clicktrans.es but I got a traceback.

Mysql2::Error: Incorrect string value: '\xC5\x82 Brz...' for column 'whois' at row 1: UPDATE `spam_domains` SET `whois` = 'REGISTRANT DATA\r\nDomain name clicktrans.es\r\nstate Activated\r\nIdentifier 5EBD2A-ESNIC-F5\r\nRegistrant Michał Brzeziński IQSC Solutions\r\nRegister Date 18-09-2012\r\nExpiration Date 18-09-2020\r\nRegistrar OVH HISPANO\r\nADMINISTRATIVE CONTACT PERSON\r\n\r\nIdentifier 5EBD5B-ESNIC-F5\r\nName Michał Brzeziński\r\nEmail michal.brzezinski@clicktrans.pl\r\n \r\nTECHNICAL CONTACT PERSON\r\n\r\nIdentifier 5EBD5B-ESNIC-F5\r\nName Michał Brzeziński\r\nEmail michal.brzezinski@clicktrans.pl\r\n \r\n \r\nDNS SERVERS\r\nServer Name IP\r\nns102.ovh.net \r\ndns102.ovh.net', `updated_at` = '2017-09-18 12:03:53' WHERE `spam_domains`.`id` = 14055

The copy/paste was rather untidy because the whois service at www.dominios.es only gives you tabular HTML output (and that only after struggling with multiple CAPTCHAs) but I'm guessing the accented Polish characters (sic) are the problem here.

tripleee commented 6 years ago

https://metasmoke.erwaysoftware.com/domains/15380 caused tracebacks too until I replaced the Unicode string with XXXXX.

Admin Street: thran khiaban anghlab khiaban bhar jnvbi brj bhar tbghh 8 adari vahd ۷۰۷

(Similarly for registrant and tech contact.)

ArtOfCode- commented 6 years ago

This just needs the DB tables to be updated to use utf8mb4 charset and collate.

tripleee commented 6 years ago

Also https://metasmoke.erwaysoftware.com/domains/15644 and https://metasmoke.erwaysoftware.com/domains/15645 would benefit greatly from being human-readable (педпроект.рф and педакадемия.рф, respectively).

ArtOfCode- commented 6 years ago

Should be fixed; all database tables are now marked as at least utf8; some with mb4.