UnicodeDecodeError on query_dmarc_record

magicjohnson commented 8 months ago

query_dmarc_record did work on 4.8.5, but after update it raises error because root TXT record contains non-decodable characters (\148 or \x94).

~$ dig mi.se TXT

; <<>> DiG 9.10.6 <<>> mi.se TXT
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34619
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;mi.se.             IN  TXT

;; ANSWER SECTION:
mi.se.          3600    IN  TXT "\148spf2.0/mfrom,pra" "a" "mx" "include:spf.gansend.com" "~all\148"
mi.se.          3600    IN  TXT "_globalsign-domain-verification=9u-j0gfHfc2Fos3TdNO2myTNylXpUxkI_i6syXPjtM"
mi.se.          3600    IN  TXT "v=spf1 a mx ip4:194.237.137.100/32 include:_spf.koneo.net include:spf.gansend.com include:mailgun.org a:spamfilter.mi.se ip4:46.246.39.10 ~all"

;; Query time: 147 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Wed Jan 10 13:47:18 +06 2024
;; MSG SIZE  rcvd: 341

In [2]: query_dmarc_record('mi.se')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
Cell In[2], line 1
----> 1 query_dmarc_record('mi.se')

File /usr/local/lib/python3.12/site-packages/checkdmarc/dmarc.py:498, in query_dmarc_record(domain, nameservers, resolver, timeout, ignore_unrelated_records)
    493 record = _query_dmarc_record(
    494     domain, nameservers=nameservers,
    495     resolver=resolver, timeout=timeout,
    496     ignore_unrelated_records=ignore_unrelated_records)
    497 try:
--> 498     root_records = query_dns(domain, "TXT",
    499                              nameservers=nameservers, resolver=resolver,
    500                              timeout=timeout)
    501     for root_record in root_records:
    502         if root_record.startswith("v=DMARC1"):

File /usr/local/lib/python3.12/site-packages/checkdmarc/utils.py:118, in query_dns(domain, record_type, nameservers, resolver, timeout, cache)
    112     resource_records = list(map(
    113         lambda r: r.strings,
    114         resolver.resolve(domain, record_type, lifetime=timeout)))
    115     _resource_record = [
    116         resource_record[0][:0].join(resource_record)
    117         for resource_record in resource_records if resource_record]
--> 118     records = [r.decode() for r in _resource_record]
    119 else:
    120     records = list(map(
    121         lambda r: r.to_text().replace('"', '').rstrip("."),
    122         resolver.resolve(domain, record_type, lifetime=timeout)))

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x94 in position 0: invalid start byte

jonlil commented 8 months ago

I would like to point out that the error happens when the library checks for invalid dmarc records in the root, for example a TXT record on the root domain that contains DMARC1. I think it would be good to be able to bypass the root servers check.

magicjohnson commented 8 months ago

I think fix does not work: undecoded bytes object just passed as is to root_record which causes further exception.

File "python3.12/site-packages/checkdmarc/dmarc.py", line 1013, in get_dmarc_record
    query = query_dmarc_record(domain, nameservers=nameservers,
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "python3.12/site-packages/checkdmarc/dmarc.py", line 502, in query_dmarc_record
    if root_record.startswith("v=DMARC1"):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: startswith first arg must be bytes or a tuple of bytes, not str

HugoAreias commented 7 months ago

Hi,

I'm still getting this issue when submitting, for instance, the domain edys.com.. I'm using Python 3.9.18 and version 5.3.1 of this package.

  answer = dmarc.query_dmarc_record(fqdn, nameservers=ns, timeout=2.0)
  File "/app/checkdmarc/dmarc.py", line 502, in query_dmarc_record
    if root_record.startswith("v=DMARC1"):
TypeError: startswith first arg must be bytes or a tuple of bytes, not str

From what I can understand, an invalid undecoded bytes object caught by the existing try block is still added to the final list of resulting records due to pass in the except (should it be continue instead or the append be moved inside the try block?).

domainaware / checkdmarc

UnicodeDecodeError on query_dmarc_record #124