Casualtek / Cyberwatch

Building a consolidated RSS feed for articles about cyberattacks
50 stars 13 forks source link

[BUG] Google News URL decode not working anymore #5

Open moehmeni opened 1 month ago

moehmeni commented 1 month ago

For decoding Google News URLs into their real ones, I am getting error:

import base64
import re
from typing import Optional

# Ref:
_ENCODED_URL_RE = re.compile(
_DECODED_URL_RE = re.compile(rb'^\x08\x13".+?(?P<primary_url>http[^\xd2]+)\xd2\x01')

def decode_google_news_url(url: str) -> Optional[str]:
    match = _ENCODED_URL_RE.match(url)
    encoded_text = match.groupdict()["encoded_url"]  # type: ignore
    encoded_text += (
        "==="  # Fix incorrect padding. Ref:
    decoded_text = base64.urlsafe_b64decode(encoded_text)

    match = _DECODED_URL_RE.match(decoded_text)
    primary_url = match.groupdict()["primary_url"]  # type: ignore
    primary_url = primary_url.decode()
    return primary_url

# Test the function
url = ""
result = decode_google_news_url(url)
print("Result:", result)
primary_url = match.groupdict()["primary_url"]  # type: ignore
AttributeError: 'NoneType' object has no attribute 'groupdict'

I think they changed recently because it was working just yesterday.

Casualtek commented 1 month ago

Thanks. I noticed it as well. I'm working on a fix. Any suggestions will be welcome though ;)