buckket / twtxt

Decentralised, minimalist microblogging service for hackers.
http://twtxt.readthedocs.org/en/stable/
MIT License
1.92k stars 79 forks source link

Include user’s twtxt nick and URL in their user-agent string #63

Closed buckket closed 8 years ago

buckket commented 8 years ago

Like proposed here: https://github.com/buckket/twtxt/issues/3#issuecomment-182738604 This allows searching the webserver’s access log for one’s followers.

Format: twtxt/1.2.3 (+https://example.com/twtxt.txt; @somebody)

Any other suggestions for the format we should use?

DracoBlue commented 8 years ago

:+1:

jomo commented 8 years ago

It's great for discovery, but on the other hand it's sort of like a tracking cookie, allowing you to track when a specific user checks for new tweets, how often and at which time, from where, etc...

Maybe this could be done on the first request only, so you can grep your logs for new followers, but can't stalk them?

Either way, this should be a configurable option IMO.

buckket commented 8 years ago

Yes, valid doubts indeed. Of course this would be configurable and disabled by default.

Including it only once when newly following someone would help, but then again you can’t really be sure of this person is still "following" you weeks later.

DracoBlue commented 8 years ago

The quickstart could ask you if you want to use that feature or not.

2016-02-11 10:21 GMT+01:00 buckket notifications@github.com:

Yes, valid doubts indeed. Of course this would be configurable and disabled by default.

Including it only once when newly following someone would help, but then again you can’t really be sure of this person is still "following" you weeks later.

— Reply to this email directly or view it on GitHub https://github.com/buckket/twtxt/issues/63#issuecomment-182778346.

http://dracoblue.net

kseistrup commented 8 years ago

Personally I would prefer if the inclusion be configurable and sent with every request. The quickstart is a good place the make people aware of the feature, and it should default to “don't disclose my identity”.

In addition to never, once and always, a possibility could be to include twtxt's URL and @username if random.random() is below a certain threshold (that could be configurable and default to, say, 0.5).

kseistrup commented 8 years ago

I rather like that format. :wink:

erlehmann commented 8 years ago

RFC 7231, Section 5.5.3 “User-Agent” states:

The User-Agent field-value consists of one or more product identifiers, each followed by zero or more comments (Section 3.2 of [RFC7230]), which together identify the user agent software and its significant subproducts.

and

A sender SHOULD limit generated product identifiers to what is necessary to identify the product; a sender MUST NOT generate advertising or other nonessential information within the product identifier.

and

A user agent SHOULD NOT generate a User-Agent field containing needlessly fine-grained detail and SHOULD limit the addition of subproducts by third parties. Overly long and detailed User-Agent field values increase request latency and the risk of a user being identified against their wishes ("fingerprinting").

I seriously doubt that information about the user (and not the user agent) is in scope for the “User-Agent“ header field. Note that there is a “From” header field defined in RFC 7231 which “contains an Internet email address for a human user who controls the requesting user agent”: https://tools.ietf.org/html/rfc7231#section-5.5.1

timofurrer commented 8 years ago

@erlehmann I agree with you - it's kinda out of scope. The From header field should contain an email address which is in our case not what we want. However, it would be nice to send the User-Agent anyway containing something like twtxt/<VERSION>.

reednj commented 8 years ago

Eh, I think its ok to put the url and username in the user-agent if the user requests it, even if it is discouraged by the RFC (but not actually disallowed, I note)

For twtxt.reednj.com I am going to use the format proposed by @buckket with an extra field at the end to show its not the official client. Something like this:

twtxt/1.0 (+http://twtxt.reednj.com/twtxt/directory.twtxt.txt; @directory) twtxt-directory/1.1

This follows the user agent spec, which allows for multiple product tokens.

erlehmann commented 8 years ago

@reednj this means your twtxt-directory-client is using the official client code just for downloading?

reednj commented 8 years ago

@erlehmann It doesn't use the official client at all...

buckket commented 8 years ago

@erlehmann twtxt is not only the official client but also the format specification, so I see no problem using it, even if @reednj does not use parts of the official client.

Personally I really like this feature, it allows to get information about ones followers without to much effort. Plus it is totally optional und requires no modification on the server side as the User-Agent is logged automatically most of the time.

If no one has any serious doubts about this I will start implementing this feature as proposed here.