buckket / twtxt

Decentralised, minimalist microblogging service for hackers.
http://twtxt.readthedocs.org/en/stable/
MIT License
1.92k stars 79 forks source link

Follower url works in browser, does not in twtxt #102

Closed kdave closed 8 years ago

kdave commented 8 years ago

HTTP:

twtxt -v follow mdom http://www.domgoergen.com/twtxt.txt
...
twtxt.twhttp DEBUG    400, message='deflate'

HTTPS

...
twtxt.twhttp DEBUG    hostname 'www.domgoergen.com' doesn't match either of '*.kasserver.com', 'kasserver.com'

works in the browser.

kdave commented 8 years ago

The url above gives 301 and redirects to http://mdom.github.io/twtxt.txt, which works.

buckket commented 8 years ago

The HTTPS URL doesn't work because of an invalid SSL certificate. This is intended behaviour.

timofurrer commented 8 years ago

Is there a need to support something like no-certificate-check like wget, git etc. do? It would be per following URL..

Personally, I would leave it as it is - that's what certificates are for! However, maybe someone has a very good reason?! Otherwise I suggest to close this issue.

buckket commented 8 years ago

We could catch the exception and print a user friendly error message, but I’m definitely not a fan of skipping the certificate check.

kdave commented 8 years ago

I'm not asking for skipping the checks, a descrtiptive error message is fine, in both cases.

buckket commented 8 years ago

Okay. Will work on this.

Lymkwi commented 8 years ago

From the understanding I got out of reading the source code, I coded a 'hack' on twhttp.py to catch the ssl.CertificateError and print a message that twtxt users could understand. Here are the elements I modified :

import asyncio
import logging

import aiohttp
import ssl
import click

from twtxt.cache import Cache
from twtxt.helper import generate_user_agent
from twtxt.parser import parse_tweets

logger = logging.getLogger(__name__)

@asyncio.coroutine
def retrieve_status(client, source):
    status = None
    try:
        response = yield from client.head(source.url)
        status = response.status
        yield from response.release()
    except ssl.CertificateError:
        click.echo(("✗ {0} : the website's SSL certificate is untrusted. Try using HTTP, " +
            "or contact the site's administrator to report the issue").format(click.style("SSL Error", bold = True, fg = "red")))
    except Exception as e:
        logger.debug(e)
    finally:
        return source, status

However I have two concers over this modification :

So what do you think?

(also, I just recently discovered twtxt and since it looks like a young project, if I could make my python skills useful to anyone that would be great.)

EDIT: This piece of code should handle 5 of the basic HTTP redirection codes (301, 302, 303, 307, 308), which seems to be one of the issues here : https://github.com/LeMagnesium/twtxt/commit/bf91cfcaa13825424edd8fd2774a8e887357d61c .

buckket commented 8 years ago

Thanks for your input!

I added the warning to retrieve_status, but still only logging the error when usingretrieve_file, as I don’t want to pollute the output of the timeline command. Managing/checking ones following list should be done with the following/follow/unfollow commands. So that’s where the warning should appear. When calling timeline I just wanna see them tweets, not caring which feeds failed.

And instead of import ssl it’s sufficient to do from ssl import CertificateError.