Better header checks - Githubissues

Respect / Rest

Thin controller for RESTful applications

http://respect.github.io/Rest

Other

605 stars 102 forks source link

Better header checks #28

Open augustohp opened 12 years ago

augustohp commented 12 years ago

Our tests are messy, declaring header functions everywhere, as of PHP 5.4 we can use http_response_code function to check headers and xdebug_get_headers together to check for HTTP header changes on our tests.

Implementing our own header abstraction would pose work without much benefit - other than having better tests.

[ ] Move version constraint to PHP 5.4 on composer.json (we could go higher).
[ ] Implement assertHttpStatusCode assertion.
[ ] Implement assertHeaderContains assertion.
Original post

Issues #18 and #25 showed we need to manipulate the HTTP header sometimes, despite the correction of the issues we should implement some way to handle header manipulation in a way that fits inside the Respect/Rest/Router and feels right.

"How we are going to do that?" is the question that this issue should fix, hopefully with a nice and sweet push =D

nickl- commented 12 years ago

+1 for HttpHeader class which should include magic to easily echo the header.

There are several more instances where status codes will have to be changed depending on the request ie; If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match and If-Range header but handling these is for another issue, lets first refactor all these hardcoded header('HTTP/1.1 401'); hacks and centralize their implementation.

The general status codes are specified in RFC2616 section 6.1.1. Status code and Reason phrase. Take note that the specification refers to the client not being required to examine or display the reason phrase, it also adds that the specified reasons are only recommendations and these can be changed but no where can I find that it states the reason phrase to be optional. Additional codes have been specified in RFCs 2295, 2518, 2324, 2774, 2817, 3229, 4918, 4842 and another pending mentioned in draft-reschke-http-status-308

The php manual indicates that the superglobal $_SERVER might contain 'SERVER_PROTOCOL' - Name and revision of the information protocol via which the page was requested; i.e. 'HTTP/1.0'; The specification RFC2616 suggests that all protocols be recognized, HTTP/0.9, 1.0, or 1.1 and to respond appropriately with a message in the same major version used by the client as well as several other restrictions for when handling client requests from lower revisions so it makes sense that we have a way to easily access the revision as a number.

There already exists a function in PHP, albeit not in any current version yet, for setting and retrieving the http_response_code and besides the obvious advantage to easily scale, since we are attempting not to change the way php does things (how much of that is still true?) it may be advisable to follow the suggestion made by craig at craigfrancis dot co dot uk. I would like to be able to retrieve the message as well as the code so we would probably make the collection accessible outside the function. But since nothing stops us from setting the header implicitly maybe this approach would just be messy with very little advantage, lucky for me it is for you to decide =)

Some current implementations for reference:

recess provides an abstract class ResponseCodes.

The messages are not accessible, which I don't like and creating constants for each code is just weird because everyone knows what 404 means so what is the benefit of referring to HTTP_NOT_FOUND, it's not as if they are going to change either.

darklaunch blog suggests a simple function HTTPStatus()

This will not work because the message and protocol should be flexible and the return is just weird referring to an error, what's up with that.

slim incorpated this in their response class Slim_Http_Response

This probably makes sense when you separate request response as in java typology but that's weird for php because all you are doing is making a response which is what I love about Respect/Rest

yii has a protected method getHeader in the CErrorHandler class.

Can you spot anything weird with that? How about bypassing the whole shebang by hardcoding header("HTTP/1.0 500 Internal Server Error"); in the same class, the documentation states that this class handles uncaught exceptions, I wonder how... wait, weird!

zf2 zend incorporates it in the Response class ala java

I always get the feeling that these zend guys swallow the php manual for breakfast and that is all that comes out all day, why so complicated!!! They made a Header class which is an Iterator I'll have you and if that is not enough the separate headers are each classes implementing the HeaderInterface, fancy. Necessary, well I like that it is abstracted but didn't they go a little too far. They also went and made constants for each code, no real benefit since they are called STATUS_CODE_123 and the only reason for that, I think, is so they can do this:

<?php
    /**
     * Set HTTP status code and (optionally) message
     *
     * @param numeric $code
     * @return Response
     */
    public function setStatusCode($code)
    {
        $const = get_called_class() . '::STATUS_CODE_' . $code;
        if (!is_numeric($code) || !defined($const)) {
            $code = is_scalar($code) ? $code : gettype($code);
            throw new Exception\InvalidArgumentException(sprintf(
                'Invalid status code provided: "%s"',
                $code
            ));
        }
        $this->statusCode = (int) $code;
        return $this;
    }

Fancy, totally. Necessary, I don't see why they couldn't simply use array_key_exists() that would be too, well simple I guess. Besides there is no reason to validate the codes since the spec allows the codes to be extendable, you can send what you like, why all htis fancy to validate, weird.

I do like the accessors for isValid isOk isError but more about that shortly. The first complete list of codes we've seen $recommendedReasonPhrases are protected but they provide accessor method getReasonPhrase()

symfony/HttpFoundation another Response class implementation.

They have the magical __toString I'd love to see... skip the content not required. The static public $statusTexts makes the collection accessible, nice, but they don't have an accessor for $statusText so you will need to check the list with the status code, weird. Why so complicating KISS already, let me see if I get this right: The headers are stored in a ResponseHeaderBag which extends on the request's HeaderBag and implements IteratorAggregate, much better, you can simply return an ArrayItterator($headers) when it's required and the Countable also adds a nice touch. The different headers can simply be collected in an Array by header key which references a collection of values to cater for the 1 to many requirement.

Lots of accessors, nice but some of the abstracted functionality now ends up all over the place instead of keeping it in one location. Some of the functionality like date, expires and their prepare methods are located in Response for example, ResponseHeaderBag handles cookies and overwrites some of HeaderBag's cache-control... I'm confused moving along.

These are the good the bad and the ugly but all have some weirdness to them. What I'd like to have:

chained setters
magic display header
request should be parsed on construction and the default values populated.
php/server errors should be caught and responded on appropriately
accessors for all the headers right down to isTeaPot, setTeaPot as well as accessors for the combined headers ie Content-Type, Accept-Language, User-Agent etc or at least have them separated.
handle HTTP dates and return or set with unix time, dates would be a bonus and for convenience only
it is a requirement that we understand the class of status code so isValid, isError, isRedirect etc
easy access to the status codes and the messages
methods based on task/action ie. unAuthorized, notAllowed, notFound, permanentRedirect, temporaryRedirect, redirectWithList, allowCacheable, expireCache, etc. changeContentType should add the Vary for example so we have a single place to solve the problems but nothing stopping you from adding what you like.
iterator aggregate, countable is nice but merely a convenience
Nothing weird or complicating but fancy is fine as long as it remains practical

Did I leave anything out? =)

nickl- commented 12 years ago

Are you done yet? heheh =)

No seriously, have you started, where are you with this? I will need to implement the basics around status codes at the very least if I am to use Respect at all. This week still by the looks of things. If you tell me what you you want and how you think it should be implemented then hopefully I can save you some time and effort.

Sooner is better.

nickl- commented 12 years ago

Forgot to add: thank you for an awesome library you guys rock!!!

Keep up the good work!

nickl- commented 12 years ago

It looks like recess was actually onto something, well almost at least they weren't totally wrong.

From apache httpd.h

/**
 * @defgroup HTTP_Status HTTP Status Codes
 * @{
 */
/**
 * The size of the static array in http_protocol.c for storing
 * all of the potential response status-lines (a sparse table).
 * A future version should dynamically generate the apr_table_t at startup.
 */
#define RESPONSE_CODES 57

#define HTTP_CONTINUE                      100
#define HTTP_SWITCHING_PROTOCOLS           101
#define HTTP_PROCESSING                    102
#define HTTP_OK                            200
#define HTTP_CREATED                       201
#define HTTP_ACCEPTED                      202
#define HTTP_NON_AUTHORITATIVE             203
#define HTTP_NO_CONTENT                    204
#define HTTP_RESET_CONTENT                 205
#define HTTP_PARTIAL_CONTENT               206
#define HTTP_MULTI_STATUS                  207
#define HTTP_MULTIPLE_CHOICES              300
#define HTTP_MOVED_PERMANENTLY             301
#define HTTP_MOVED_TEMPORARILY             302
#define HTTP_SEE_OTHER                     303
#define HTTP_NOT_MODIFIED                  304
#define HTTP_USE_PROXY                     305
#define HTTP_TEMPORARY_REDIRECT            307
#define HTTP_BAD_REQUEST                   400
#define HTTP_UNAUTHORIZED                  401
#define HTTP_PAYMENT_REQUIRED              402
#define HTTP_FORBIDDEN                     403
#define HTTP_NOT_FOUND                     404
#define HTTP_METHOD_NOT_ALLOWED            405
#define HTTP_NOT_ACCEPTABLE                406
#define HTTP_PROXY_AUTHENTICATION_REQUIRED 407
#define HTTP_REQUEST_TIME_OUT              408
#define HTTP_CONFLICT                      409
#define HTTP_GONE                          410
#define HTTP_LENGTH_REQUIRED               411
#define HTTP_PRECONDITION_FAILED           412
#define HTTP_REQUEST_ENTITY_TOO_LARGE      413
#define HTTP_REQUEST_URI_TOO_LARGE         414
#define HTTP_UNSUPPORTED_MEDIA_TYPE        415
#define HTTP_RANGE_NOT_SATISFIABLE         416
#define HTTP_EXPECTATION_FAILED            417
#define HTTP_UNPROCESSABLE_ENTITY          422
#define HTTP_LOCKED                        423
#define HTTP_FAILED_DEPENDENCY             424
#define HTTP_UPGRADE_REQUIRED              426
#define HTTP_INTERNAL_SERVER_ERROR         500
#define HTTP_NOT_IMPLEMENTED               501
#define HTTP_BAD_GATEWAY                   502
#define HTTP_SERVICE_UNAVAILABLE           503
#define HTTP_GATEWAY_TIME_OUT              504
#define HTTP_VERSION_NOT_SUPPORTED         505
#define HTTP_VARIANT_ALSO_VARIES           506
#define HTTP_INSUFFICIENT_STORAGE          507
#define HTTP_NOT_EXTENDED                  510

/** is the status code informational */
#define ap_is_HTTP_INFO(x)         (((x) >= 100)&&((x) < 200))
/** is the status code OK ?*/
#define ap_is_HTTP_SUCCESS(x)      (((x) >= 200)&&((x) < 300))
/** is the status code a redirect */
#define ap_is_HTTP_REDIRECT(x)     (((x) >= 300)&&((x) < 400))
/** is the status code a error (client or server) */
#define ap_is_HTTP_ERROR(x)        (((x) >= 400)&&((x) < 600))
/** is the status code a client error  */
#define ap_is_HTTP_CLIENT_ERROR(x) (((x) >= 400)&&((x) < 500))
/** is the status code a server error  */
#define ap_is_HTTP_SERVER_ERROR(x) (((x) >= 500)&&((x) < 600))
/** is the status code a (potentially) valid response code?  */
#define ap_is_HTTP_VALID_RESPONSE(x) (((x) >= 100)&&((x) < 600))

see askapache

alganet commented 12 years ago

Hi @nickl- ! Nice reasearch you did =) I'm gonna take a look and comment here again as soon as possible. @augustohp is also looking at it.

I like the http_response_code() function. It's native, straightforward and decoupled. Response classes and custom functions could lead to strong coupling between the router and the route implementation.

RFC2616 also says that response codes are normative, but the messages are optional and can be customized. This means that you can respond "HTTP/1.1 404 Document Missing" or "HTTP/1.1 404 Video Not Found" if you want. It would be nice to give the users that kind of control somehow. Perhaps we should even submit a patch to PHP for a http_response_message() function =) I would like that.

The header_register_callback is also very awesome. We could allow users to send header('HTTP/1.1 404') and adjust the protocol version, fill up the message and so on by rewiring things on the callback.

alganet commented 12 years ago

Well, we do have a problem. Making raw headers accessible without compromising the API is hard. One of the main features of Respect\Rest is its pure PHP approach inside controllers. The desired scenario is a developer being able to create a RESTful app without explicitly setting and getting headers.

For example, both Slim and Silex requires you to always use the Silex Application instance, something like this (from their docs):

$app->get('/hello/{name}', function ($name) use ($app) {
    return 'Hello '.$app->escape($name);
});

Slim also uses a lot of static methods, like Slim::getInstance().

Respect\Rest on the other hand handles the HTTP flow using routines. So, instead of return $app->whatever we split controllers into smaller chunks:

$r3->get('/mailboxes/*/messages/*', function($mailbox, $messageId) {
    // do fancy imap work
})->accept(array(
    'text/html' => new MailHtmlRenderer()
));

Many features you've described, like cache, user agents and content negotiation are handled by Routines. It's safe to say that Respect\Rest is a little bit more high level on the HTTP workflow than other libraries 'cause the developer creates relations between the controller and standard HTTP routines encapsulated by routine classes. There could be routines for Vary, pre-conditions, ETags and even more hardcore high-level RESTful protocols like ATOM.

We just need to figure out how to separate the HTTP flow from the application-specific code (database handling, transforming data, etc), like we did on the Accept routine.

Internally, we could wrap header manipulation in a more elaborate object. This would certainly benefit us, since most HTTP flows include more than one header and sometimes different headers would lie on different classes.

For the developer using Respect\Rest, the best we could do is to let him use header(), headers_sent() and standard PHP functions if he likes, then ensure that custom manipulation won't break internal manipulation. Preferably, the user will do this on a By or Through routine and not in the main closure/method.

By now, all of this is really foggy on my mind. I believe we should play with some API drafts and then build requirements for the header manipulation based on this. For example, a draft for the CORS manipulation could be something like this:

$r3->any('/images/*/metadata', function($fileName) {
    //hardcore image metadata stuff
})->cors('*'); //Allows cross requests from any origin.

What do you guys think? @augustohp have you seen this issue?

nickl- commented 12 years ago

Hard indeed I've been fiddling with the status codes a bit but not finding happiness yet.Trying to bend my head around turning header manipulation into a routine won't budge either. The problem being that it is part and purpose to the different routines and it only makes sense that it should happen there.

For example the Accept routine should have the Content-Type set to the negotiated mime and it makes logical sense that the same routine should indicate to cache through Vary: Accept that this variance of the content is based on the Accept header field and may be treated as authoritative for subsequent requests identifiable through this field. For these same arguments I would see the Accept routine handling it's own 406 Not Acceptable implementation, it needs to happen here.

Busy rolling a patch to this regard for us to discuss further. The question it raises: Does it make sense to group this functionality at a separate location? This does not however relate to what should be done regarding PHP's inadequacies in the REST domain which IMO should be addressed by Respect\Rest. These include the status codes, date handling, content length (possibly), cache-control/pragma, etc. @alganet Maybe you are right that if we approach this problem from a different angle by implementing more complex routines like CORS and JSONP, duplication will become evident and maybe the approach we should take might just reveal itself.

Looking at other frameworks will help to identify where PHP falls short but unfortunately (for lack of a better word) we are in uncharted territory, as no one has approached the problem quite like Respect\Rest and it would best be served by solutions that fit the design. Can we think of anything else that might fit the same or similar arguments? Any other problem/duplication/requirement that spans across Routines and which cannot be addressed in solidarity.

Will chew on what you said a bit more and get back when the fog has settled. =)

nickl- commented 12 years ago

Another question which might also help us find a proper solution: What are PHP's shortcoming related to header manipulation?

No support for status codes
No way to manipulate headers after calling header()
No native support for HTTP dates
No considerations for cache
No built in redirect capabilities (3xx)
No content encoding detection and normalization
No de/compression automation
No native user agent and capability detection (what are the capabilities of interest to us?)

These are just off the top of my head and should be validated through further discussion. I deliberately exclude cookies as I think they are an abomination and has no place in REST.

Anything else you can think of which I have missed or items you feel we should not address from this list?

nickl- commented 12 years ago

Wikipedia List of HTTP status codes mentions several extra codes in use today.

The extra codes worth mentioning that are not implemented by Symfony's HttpFoundation:

420 Enhance Your Calm (Twitter) which is a duplication of 429 Too Many Requests already specified in RFC 6585
443 uses an invalid certificate. This might have reference to port 443 used for HTTPS but can't find out where exactly it stems from. Marked the wikipedia entry for citation needed and will investigate this further.
444 No Response - Nginx way of saying "talk to the hand" server not interested
449 Retry With - Microsoft bending standards again this seems more of a 3xx type response, certain actions required before reattempting request
450 Blocked by Windows Parental Controls - Microsoft nuff said
499 Client Closed Request - Nginx used for log purposes when it is not possible to return a header to client
509 Bandwidth Limit Exceeded - used by Apache bw/limited extension and apparently widely adopted, this should get an IETF document as support if that be the case. More investigation required
598 Network read timeout error - MS proxy to clients before the proxy
599 Network connect timeout error - MS proxy to clients before the proxy

That brings the list up to the following codes, all inclusive:

100 Continue
101 Switching Protocols
102 Processing (WebDAV; RFC 2518)
200 OK
201 Created
202 Accepted
203 Non-Authoritative Information (since HTTP/1.1)
204 No Content
205 Reset Content
206 Partial Content
207 Multi-Status (WebDAV; RFC 4918)
208 Already Reported (WebDAV; RFC 5842)
226 IM Used (RFC 3229)
300 Multiple Choices
301 Moved Permanently
302 Found
303 See Other (since HTTP/1.1)
304 Not Modified
305 Use Proxy (since HTTP/1.1)
306 Switch Proxy (deprecated)
307 Temporary Redirect (since HTTP/1.1)
308 Permanent Redirect (experimental Internet-Draft)[10]
400 Bad Request
401 Unauthorized
402 Payment Required
403 Forbidden
404 Not Found
405 Method Not Allowed
406 Not Acceptable
407 Proxy Authentication Required
408 Request Timeout
409 Conflict
410 Gone
411 Length Required
412 Precondition Failed
413 Request Entity Too Large
414 Request-URI Too Long
415 Unsupported Media Type
416 Requested Range Not Satisfiable
417 Expectation Failed
418 I'm a teapot (RFC 2324)
420 Enhance Your Calm (Twitter) -> 429
422 Unprocessable Entity (WebDAV; RFC 4918)
423 Locked (WebDAV; RFC 4918)
424 Failed Dependency (WebDAV; RFC 4918) / Method Failure (WebDAV)[13]
425 Unordered Collection (Internet draft)
426 Upgrade Required (RFC 2817)
428 Precondition Required (RFC 6585)
429 Too Many Requests (RFC 6585)
431 Request Header Fields Too Large (RFC 6585)
443 uses an invalid certificate.
444 No Response (Nginx)
449 Retry With (Microsoft)
450 Blocked by Windows Parental Controls (Microsoft)
499 Client Closed Request (Nginx)
500 Internal Server Error
501 Not Implemented
502 Bad Gateway
503 Service Unavailable
504 Gateway Timeout
505 HTTP Version Not Supported
506 Variant Also Negotiates (RFC 2295)
507 Insufficient Storage (WebDAV; RFC 4918)
508 Loop Detected (WebDAV; RFC 5842)
509 Bandwidth Limit Exceeded (Apache bw/limited extension)
510 Not Extended (RFC 2774)
511 Network Authentication Required (RFC 6585)
598 Network read timeout error (MS proxy)
599 Network connect timeout error (MS proxy)

nickl- commented 12 years ago

Another example:

Reverence created a HTTP\Response which is slightly ambiguous as it only aims to implement the Header functionality. It uses a similar approach to Zend for validating the status codes, but status codes _do not_ have to be validated.

RFC2616 6.1.1:

[..] treat any unrecognized response as being equivalent to the x00 status code of that class [..]

It also returns the $text which was added to the header which IMO lacks foresight as it was never used and would have been better served by chaining.

alganet commented 12 years ago

Regarding the shortcomings, current state:

No support for status codes

I believe the support is good enough. We do have a couple of use cases that PHP covers:

Setting a custom status code: header('HTTP/1.1 410');
Setting a custom status message: header('HTTP/1.1 400 Field email is mandatory.');
Setting different messages for the same status codes
Setting custom HTTP headers and any HTTP status codes.

No way to manipulate headers after calling header()

There is limited manipulation. You can empty headers, but can't remove them. In PHP 5.4 is possible to remove them completely. Emptying is done by: header('X-Foo:', true);

No native support for HTTP dates

The constant Datetime::ISO8601 contains the format for HTTP dates. Both the DateTime class and the date() function supports this format specification.

No considerations for cache

None, at all. Just allows you to set and read headers.

No built in redirect capabilities (3xx)

header('Location: is possible! Kinda low level.

No content encoding detection and normalization

PHP supports chunked encoding and can let the webserver do this. There are available mods for Apache and other webservers that will perform better than PHP code.

No de/compression automation

Also available in webservers. Possible with ob_handlers and gzip automatically.

No native user agent and capability detection (what are the capabilities of interest to us?)

Respect\Rest already has a userAgent( routine! Support is based on regular expressions on top of raw user agent strings.

I believe we can work with that for now, implement only the minimal to reuse PHP and its ecosystem as much as possible. We may open bugs and propose changes on the PHP itself to have better support, along C patches to the core.

nickl- commented 12 years ago

Regarding the shortcomings, and reply to Alexandre:

I believe the support is good enough. We do have a couple of use cases that PHP covers:

Setting a custom status code: header('HTTP/1.1 410'); Setting a custom status message: header('HTTP/1.1 400 Field email is mandatory.'); Setting different messages for the same status codes Setting custom HTTP headers and any HTTP status codes.

This is like saying markdown has support for morse code see: . . . . _ _ _ _ . _ . _ . . _ _ . . . there I can send morse code =) Not that I know what I sent if at all legible, nor can I have it translated or see what I sent so far. I can't even try and do anything about this message if I wanted to. The fact that we can send status headers with the header command is not sufficient reason to call it support. Can we make do with what we have? Yes sure I have to agree with that, but I still think we can do better. Would we be changing the way PHP does things? Come on that's nothing more than a glorified echo, which we will still use, don't have a choice, just maybe a little more intelligently.

No way to manipulate headers after calling header() There is limited manipulation. You can empty headers, but can't remove them. In PHP 5.4 is possible to remove them completely. Emptying is done by: header('X-Foo:', true);

I like what you are saying, even if it smells like the biggest hack since probably the cookie, I don't care. This requires more investigation, where can we get information?

No native support for HTTP dates The constant Datetime::ISO8601 contains the format for HTTP dates. Both the DateTime class and the date() function supports this format specification.

Again we have two different views here, I have no doubts that php is capable what I want is getDate and setDate.

No considerations for cache None, at all. Just allows you to set and read headers.

Maybe I expect way too much but if we can't aim high then what is the point, If I could have my way I would actually see the cache headers populated appropriately on PUT, POST automagically and have a convenient invalidate that will Cache-Control, Pragma, YadaYada as it needs to. All outgoing messages for GET stamped approved and authoritative without me thinking about it. Humour me while I continue dreaming, if I may, I also want this "cache aware system" to provide me with an unique checksum/hash calculated from the response with which it is capable to identify subsequent requests to whom I can serve my own cache copies to. It's still long before XMAS so don't be surprised if my letter to Santa gets longer.

No built in redirect capabilities (3xx) header('Location: is possible! Kinda low level.

This is what I am talking about, why must I have to set Location as well if I wanted a 301, why should I even know what is involved? All I want to do is tell the world that this url changed, that's it. Nothing else, is that so difficult? While that rookie stays productive I might want to know that this is a 301 response so that I can log it appropriately, is this allowed? What if I want to change it mid way to a 302? What about multiple options? What if I actually have the route to process or proxy this instead?

No content encoding detection and normalization PHP supports chunked encoding and can let the webserver do this. There are available mods for Apache and other webservers that will perform better than PHP code.

I think I meant character encoding but nice answer =) I want all data already decoded by the time I process it and also don't want to care about what gets delivered as it should not be my concern.

No de/compression automation Also available in webservers. Possible with ob_handlers and gzip automatically.

You are right and maybe there is a separation of concerns, which I'd see identified here. As a matter of fact I have actually started looking at ways to implement REST with mod_rewrite and mod_header alone by mocking static json, html and xml files just to see how far we can get. The question we should ask though is why do we want to move the responsibility/functionality down the line? Does this mean I can't write a proxy, cache service, gateway, CDN, the list goes on because I don't have the headers at my disposal. Why not I ask? and I hope it's not because it's easier.

No native user agent and capability detection (what are the capabilities of interest to us?) Respect\Rest already has a userAgent( routine! Support is based on regular expressions on top of raw user agent strings.

This is what I am getting at, we don't have to strip the headers for verbs or the Accepts why is this a special case? Don't get me wrong I haven't looked at the implementation and this is not critique on the work already done but these expressions are actually available from get_browser() and the browscap.ini and if this was enough then we didn't need detectmobilebrowsers.com or the new wurfl surely? Rather than spending the time on stripping the user agent these concerns should've been transfered somewhere else from the start and instead the effort directed at display, device, storage, video, etc instead of duplicating efforts on a solution that is not reusable.

I believe we can work with that for now, implement only the minimal to reuse PHP and its ecosystem as much as possible. We may open bugs and propose changes on the PHP itself to have better support, along C patches to the core.

I agree wholeheartedly with the current design and motto, don't intercept the response, don't force me to use an application or even a controller for that matter I love it but this does not mean we shouldn't utilize the resources appropriately and have better design and implementation focus. Although your motives are noble and I will be the first to sit up all night to get those patches in, you know realistically speaking we are still going to be stuck with 5.3 for at least the next 3 years if not longer. I don't know about you but I'd like to have a solution a little before then. =)

@alganet your conclusion and arguments are all valid and thank you for your time in so skillfully summarizing them, I am impressed. I am merely arguing here to broaden the picture and it is not an attempt to disagree with you. We all want the same thing, a solution that is simple to extend and easy to use that doesn't try and change the way PHP works.

To conclude... oops I ran out of space : /

nickl- commented 12 years ago

To conclude here is my implementation of the Accept Routine, I don't like it, I don't like it a bit and therefor I haven't committed it.


<?php

namespace Respect\Rest\Routines;

use Respect\Rest\Request;

/** Handles mime type content negotiation */
class Accept extends AbstractAccept
{
    const ACCEPT_HEADER = 'HTTP_ACCEPT';

    protected function compareItens($requested, $provided)
    {
        if ($requested === $provided || $requested === '*/*') {
                header("Vary: Accept");
                header("Content-Type: $provided");
                return true;
        }

        list($requestedA, $requestedB) = explode('/', $requested);
        list($providedA, ) = explode('/', $provided);

        if ($providedA === $requestedA && $requestedB === '*'){
                header("Vary: Accept");
                header("Content-Type: $provided");
                return true;
        }

        return false;
    }

    public function when(Request $request, $params)
    {
        $valid = parent::when($request, $params);

        if (!$valid)
                header('HTTP/1.1 406 Not Acceptable');

        return $valid;
    }

}

We conceivably have to do similar headers for each and every Routine we already have and still going to make and surely there must be a DRY way of going about.

nickl- commented 12 years ago

@alganet I found the header_remove() =)

Very eager to investigate with high hopes and expectations that this will be the answer to many of our questions, instead it left me wanting, even more determined to replace it after all the dodgy gotchas and undocumented features (trying hard not to say bugs) I found with the current implementation.

The functions provider for header manipulation:

header() - is used to send a raw HTTP header.
header_remove() - Removes an HTTP header previously set using header().
- header_remove('HeaderKey') - removes the first entry in the header_list and all entries from the actual header.
- header_remove() - removes all headers from the header_list but this only affects some of the actual headers.
headers_list() - will return a list of headers to be sent to the browser / client. Although the honest truth is not exactly as specified.
apache_response_headers() - Fetch all HTTP response headers. This does not reflect any changes from the header functions.
headers_sent() - Checks if or where headers have been sent. Two optional parameters passed by reference will reflect the filename and linenum where the header flush occurred.
http_response_code() - not yet implemented. Get or Set the HTTP response code.
header_register_callback() - not yet implemented. Registers a function that will be called when PHP starts sending output.

Research results and findings

Without any further a do, here are the results from my research, I will leave it up to you to comment:

<?php
header('Warning: 199 ex.com "Careful now"'); // add a header
header('Warning: 199 ex.com "You have been warned."', false); // add duplicate do not replace. default = replace
header('HTTP/1.1 406 Not Acceptable'); // change status code to 406 not in header list
header('Content-Type: text/html', false, 300); // new header added and changes the status code
header('Content-Type: application/xml', true, 201); // replaces header and changes the status code

header_remove('Warning'); // remove 1 header identified by 'Warning' from the header list on a FIFO basis
                          // in our example Warning: 199 ex.com "Careful now" will be removed
                          // however in the actual header that will be sent the value looks like this:
                          // Warning:199 ex.com "Careful now", 199 ex.com "You have been warned."
                          // And after calling header_remove both were removed from the actual header
                          // even though one still remains in the header_list???
header_remove('Content-Type'); // removes Content-Type from header list since only one exists
                               // actual header response status remains 201 and
                               // Content-Type remains application/xml.???

header('Content-Type: text/html', false, 300); // new header added and changes the status code
header('Content-Type: application/json', false, 201); // add another header and changes the status code

// We now have 2 Content-Type headers in the header_list but only the last one gets sent.???

header_remove('Content-Type'); // text/html removed from header_list no actual change in headers sent
header_remove('Content-Type'); // application/json  removed from header_list no actual change in headers sent

// There is no reliable relation to what exists in the header_list and that which eventually gets sent. 
// This is not acceptable.

header_remove(); // remove all headers does not affect status code or Content-Type

// I did not test any of the other default headers but thus far it has not been reassuring.

// Some other gotchas and "features" to take note of: 
// The following will all change the status code to 200
// Without new headers appearing in the header_list()
header('HTTP/1.1 200');
header('HTTP/1.1');
header('HTTP/');
header('HTTP/Anything:you:like:without:a:space');
header('HTTP/:1.1;!@#$%:^&*():?+/=-_;"\':'); // colons
header('HTTP/',false);
header('HTTP/',true);
header('HTTP/','Whatever you want without a colon');
header('HTTP/', false, 300);
header('HTTP/', true, 400);
header('HTTP/1.1:', true, 401);

// The following will change the status code to 500 Internal Server error
// without any PHP exceptions or errors
// Without new headers appearing in the header_list()
header('HTTP/1.1 :');
header('HTTP/1.1 : 300');
header('HTTP/1.1 : 300', true, 400);
header('HTTP/1.1 1');
header('HTTP/1.1 10');
header('HTTP/1.1 001');
header(1,1,1);
header(1,1,10);
header(1,1,001);

// The following will all change the status to the supplied status code.
// In these examples the status code changes to 201
// Without new headers appearing in the header_list()
header('HTTP/1.1: 201');
header('HTTP/1.1::::: 201');
header('HTTP/:1.1;!@#$%:^&*():?+/=-_;"\': 201');
header('HTTP/Anything.you.like.without.a.space 201');
header('Anything without a colon',0,201);
header(1,1,201);
header(0,0,201);
header(0,null,201);
header(0,'Anything yau want',201);
header(true,0,201);
header(' ',0,201);

// The following will do absolutely nothing
header(201);
header(1);
header(1,1);
header(null,true,201);
header('',true,201);
header(false,0,201);
header(null,0,201);
header('Anything without a colon');
header('!@#$%^&*(){}[]/=\?+|-_;`~');
header(0,0,0);

nickl- commented 12 years ago

Very nice to spec .js implementation: headers.js

Doesn't cover status codes though but handles these requirements:

Content-Type (per RFC 2616)
Accept (per RFC 2616)
Accept-Language (per RFC 2616)
Accept-Charset (per RFC 2616)
Accept-Encoding (per RFC 2616)
Cookie (per http://tools.ietf.org/html/draft-ietf-httpstate-cookie-18)
Set-Cookie (per http://tools.ietf.org/html/draft-ietf-httpstate-cookie-18)
Cache-Control (per RFC 2616)
Link (per RFC 5988)

alganet commented 12 years ago

@nickl- you have a serious talent for research. I mean it! =D

After reading your tests, I'm even thinking in submitting some bugs to PHP itself. Maybe patches (if my C skills doesn't let me down, they frequently do).

By the way, I agree with you in many of the PHP flaws you pointed. Even with ways to hack over them, it's sad to deal with semi-broken things.

I've been thinking on the interface and implementation for centralizing most of the environment manipulation Respect\Rest does. This includes header manipulation, POST, GET and SERVER variables.

I do like the native PHP interfaces. $_GET, $_POST, $_SERVER, getenv, putenv, header(), etc. What they need is better HTTP workflow handling. We should keep this interface as similar as possible as native PHP to reduce the already high curve of learning Respect has:

<?php
use Respect\Rest\Env as _;

_::SERVER('HTTP_ACCEPT_LANGUAGE'); //$_SERVER['HTTP_ACCEPT_LANGUAGE'];
_::POST('name');
_::with()->POST['name']; //alternative syntax for safe grepping (also ends with '])
_::header();
_::header_register_callback();

The main difference is that these functions don't work exactly like raw PHP, they know how to replace, merge, add and reason about headers manipulated during runtime. Main limitation is that we may need output buffering if we can't hack around something using our implementation.

This could even "instant restify" websites already using native PHP functions by running a simple grep on the project folder, or monkey-patching application namespaces (declaring the monkey patched function inside the application namespace replaces its usage). Though I wouldn't distribute code to do this specific part under Respect. Too hacky, maybe a gist out for curiosity.

Using something like this we reduce dramatically the repeating hacking inside classes while keeping a very friendly interface that now does what is suposed to do.

What do you guys think? @augustohp @nickl-

nickl- commented 12 years ago

At the same time serves as an example to The PHP Group of how it is suppose to work/we expect it to work, based on the actual user input.

+1 I like the _ (underscore) idea that meme is already strong in the .js scene but not so much in PHP, people will already grasp the concept by name alone.

We will need to buffer but this has almost become std procedure these days and if you want to manipulate that we can also add the ::ob* functions, without breaking Respect/Env. Need to chew on this some more... awesome stuff!

Not too sure I like Env for the name though :/

alganet commented 12 years ago

Me and @augustohp always argue about names, he often wins =P I'm not good at it! Any suggestions? Maybe Sandbox or Box. Could be even a separate component, I see how this could benefit unit testing as well, for example.

nickl- commented 12 years ago

lets not allow something like a names hinder our progress, I find things name themselves short for long. Besides it's not set in stone and we can easily change a name.

What I think we should focus on is defining a strategy and start mapping a path with the functionality we want to clone. I have been thinking that the HTTPheader functionality was bound to end up in it's own module by the looks of things, maybe this project should even go a step further and keep the clones separate(with dependencies naturally) but similar to php and that you may choose which ports you want, like cygwin, for example. What you think?

I had a look at monkey patching PHP but I don't see how we will do that, not with things as they are in 5.3, unless I am mistaken and you have some other trix up your sleeve. I would imagine that we would attempt to hide the original functionality by creating the patches within a namespace. The problem is that these functions are not in a namespace. They are not even in the global namespace where they claim to be hence why you can still go on as usual not explicitly referencing /phpinfo() for example it works as phpinfo() whether you are in a namespace nested namespace whatever. So how will we be able to ``use Respect\Clonecamp\phpinfo as phpinfo; and get away referencing that instead when you call phpinfo();. I just don't see it happening.

Currently we know we want to do the headers (networking) but this will require ob, as we've identified already, so here are the ob functions.

Output Buffering Control

flush — Flush the output buffer
ob_clean — Clean (erase) the output buffer
ob_end_clean — Clean (erase) the output buffer and turn off output buffering
ob_end_flush — Flush (send) the output buffer and turn off output buffering
ob_flush — Flush (send) the output buffer
ob_get_clean — Get current buffer contents and delete current output buffer
ob_get_contents — Return the contents of the output buffer
ob_get_flush — Flush the output buffer, return it as a string and turn off output buffering
ob_get_length — Return the length of the output buffer
ob_get_level — Return the nesting level of the output buffering mechanism
ob_get_status — Get status of output buffers
ob_gzhandler — ob_start callback function to gzip output buffer
ob_implicit_flush — Turn implicit flush on/off
ob_list_handlers — List all output handlers in use
ob_start — Turn on output buffering
output_add_rewrite_var — Add URL rewriter values
output_reset_rewrite_vars — Reset URL rewriter values

This is the complete Networking

Network Functions

checkdnsrr — Check DNS records corresponding to a given Internet host name or IP address
closelog — Close connection to system logger
define_syslog_variables — Initializes all syslog related variables
dns_check_record — Alias of checkdnsrr
dns_get_mx — Alias of getmxrr
dns_get_record — Fetch DNS Resource Records associated with a hostname
fsockopen — Open Internet or Unix domain socket connection
gethostbyaddr — Get the Internet host name corresponding to a given IP address
gethostbyname — Get the IPv4 address corresponding to a given Internet host name
gethostbynamel — Get a list of IPv4 addresses corresponding to a given Internet host name
gethostname — Gets the host name
getmxrr — Get MX records corresponding to a given Internet host name
getprotobyname — Get protocol number associated with protocol name
getprotobynumber — Get protocol name associated with protocol number
getservbyname — Get port number associated with an Internet service and protocol
getservbyport — Get Internet service which corresponds to port and protocol
header_register_callback — Call a header function
header_remove — Remove previously set headers
header — Send a raw HTTP header
headers_list — Returns a list of response headers sent (or ready to send)
headers_sent — Checks if or where headers have been sent
http_response_code — Get or Set the HTTP response code
inet_ntop — Converts a packed internet address to a human readable representation
inet_pton — Converts a human readable IP address to its packed in_addr representation
ip2long — Converts a string containing an (IPv4) Internet Protocol dotted address into a proper address
long2ip — Converts an (IPv4) Internet network address into a string in Internet standard dotted format
openlog — Open connection to system logger
pfsockopen — Open persistent Internet or Unix domain socket connection
setcookie — Send a cookie
setrawcookie — Send a cookie without urlencoding the cookie value
socket_get_status — Alias of stream_get_meta_data
socket_set_blocking — Alias of stream_set_blocking
socket_set_timeout — Alias of stream_set_timeout
syslog — Generate a system log message

I love how you mention $_SERVER it's amazing how we manage with that?!?! It is totally unreliable and I would love to see us ensure that a) the information has been validated and b) is consistent from installation to installation. Hell even get some sort of continuity between CLI and CGI would be great. Include all the extensions, include all ENV even do the CONSTANT mappings right. Some of these things can be solved in PHP but we are more than likely going to need some help from other friends Process Control Extensions, Other Basic Extensions even Other Services

This is no small undertaking, good thing we start setting the goals and plan with the bigger picture in mind.

nickl- commented 12 years ago

Hmmm is runkit maybe what we're after? evil grin

nickl- commented 12 years ago

So these seem to be the options:

Advanced PHP debugger Patchwork ext/test_helpers and the new runkit The last 3 are all here on github =)

alganet commented 12 years ago

Monkey patch needs to be only a choice. It's too hacky and doesn't work everywhere (shared hostings and some cloud providers can't install some extensions).

Providing our wrapper under _::something seems to be better for internal use. External developers may choose between classical PHP and our wrapper, and since we could use output buffering, we could even control the classical PHP calls from the Box itself. Seems like a simpler first step and possibly an easy one. What do you think?

nickl- commented 12 years ago

I agree I'm not big on hacking the interpreter but since we are going this route I think it is wise for us to know all our options.

So we should probably start with these, the ones we need:

header_register_callback — Call a header function header_remove — Remove previously set headers header — Send a raw HTTP header headers_list — Returns a list of response headers sent (or ready to send) headers_sent — Checks if or where headers have been sent http_response_code — Get or Set the HTTP response code

O yes and... and... and.. apache_response_headers ( void ) getallheaders ( void ) apache_request_headers() - Fetch all HTTP request headers get_headers() - Fetches all the headers sent by the server in response to a HTTP request $http_response_header stream_get_meta_data() - Retrieves header/meta data from streams/file pointers nsapi_request_headers — Fetch all HTTP request headers nsapi_response_headers — Fetch all HTTP response headers

It's like Pandorah's box!

I just tested and this is sufficient to send headers before response....

<?php
                \ob_start(function ($buffer, $flag) {
                                // flag = 5
                        \header('Output-Buffer: Handled');
                        return $buffer;
                });

as long as you don't flush =( so alas we will have to do the Output buffers we don't have a choice.

I was hoping we could jump straight to these $_SERVER $_REQUEST $_GET $_POST $_ENV phpinfo()

We have to catch it and respond to checks. ob_start — Turn on output buffering ob_get_status — Get status of output buffers

To prevent all of these: flush — Flush the output buffer ob_clean — Clean (erase) the output buffer ob_end_clean — Clean (erase) the output buffer and turn off output buffering ob_end_flush — Flush (send) the output buffer and turn off output buffering ob_flush — Flush (send) the output buffer ob_get_clean — Get current buffer contents and delete current output buffer ob_get_flush — Flush the output buffer, return it as a string and turn off output buffering ob_implicit_flush — Turn implicit flush on/off

Since the handler won't work we probably need these: ob_get_contents — Return the contents of the output buffer ob_get_length — Return the length of the output buffer ob_get_level — Return the nesting level of the output buffering mechanism

Who said anything about handlers? ob_list_handlers — List all output handlers in use ob_gzhandler() - ob_start callback function to gzip output buffer ob_iconv_handler() - Convert character encoding as output buffer handler mb_output_handler() - Callback function converts character encoding in output buffer ob_tidyhandler() - ob_start callback function to repair the buffer "default handler"

These should not interfere too much with us but we will need to know about them. output_add_rewrite_var — Add URL rewriter values output_reset_rewrite_vars — Reset URL rewriter values

nickl- commented 12 years ago

I did a few mock tests and this is what I came up with keeping the

Respect the way PHP does things.
Make it as simple as possible to integrate into existing code.
While keeping things understandable while you go on doing things the way you used to.
The use of "_" is just cool and we want to see that.

verdict

Functions are a bitch and we probably will have to change their implementations to using _\function_name() instead. Unless you have a better idea.
Variable get overwritten so we need to name them different. I vote for prefixing variables with _S the S looks like a dollar sign and reading the implementation makes sense again whereas $__REQUEST is very ambiguous and the extra _ is not easily spotted or might be mistaken as a typo instead $_S_REQUEST clearly shows that something else is happening but you will still recognize it as $_REQUEST. Besides __ is for PHP internal use and we should Respect that.
Overwriting objects work like a charm, namespace alone allows for transparent overwriting.
I think calling the namespace has the best results and gives us the WOW factor. I tried a class and changing the "use Respect\Environment as " but they all look funny or don't work as expected.
- This allows for easily calling functions with _\function_name(); without anything else to do.
- Simply add use _\PHP_Class; to use the new implementation transparently. You actually had use \PHP_Class; there already =)
- Variables all seem to be global regardless of namespace and using $_S_POST without including the namespace works as expected.

Open to suggestions.

functions

These have the biggest gotcha as the name-spacing works different for them. You can't seem to make an alias for a function by using "use".

<?php
namespace _;

 // Can't do this: PHP Fatal error:  Call to undefined function _\\PHP_print_r()
// use \print_r as PHP_print_r;

function print_r($expression, $return=false) {
  $r = '<pre>'.PHP_EOL;
  $r .= \print_r($expression,true);
  $r .= '</pre>'.PHP_EOL;

  if ($return)
      return $r;

  echo $r;
}

<?php
use _\print_r as print_r; // does nothing calling print_r uses PHP's \print_r

echo _\print_r($_SERVER); // this calls the correct function

variables

This is what the variable changes would look like

<?php
namespace _;

// using $_SERVER = array(); just wipes the SUPERGLOBAL =(
$_S_SERVER = \array_merge(array('_SOMETHING_SPECIAL' => 'You got that right'), $_SERVER);

<?php
echo _\print_r($_S_SERVER); // prints _S version using _\print_r =)

objecs

And magically change object implementations by purely modifying the "use" declaration.

<?php
namespace _;

class DOMDocument extends \DOMDocument {

  public function saveHTML($node = null) {
    $r = '<pre>'.PHP_EOL;
    $r .= \htmlentities(parent::saveHTML($node));
    $r .= '</pre>'.PHP_EOL;

    return $r;
  }
}

<?php
use _\DOMDocument; // works with or without "as DOMDocument"

$d = new DOMDocument();
$d->loadHTML('<html><body>Check this out!</body></html>');
echo $d->saveHTML();

nickl- commented 12 years ago

For those wondering where we are going with all this, it looks something like:

We do:

<?php
namespace _;

class DOMDocument extends \DOMDocument {
  public function __toString() {
    return $this->saveHTML();
  }
}

So you can finally do:

<?php
use _\DOMDocument; 

$dom = new DOMDocument();
$dom->loadHTML('some_or_other.html');

echo $dom; // What are we waiting for? =)

All while being in complete harmony with PHP and the rest of your applications.

alganet commented 12 years ago

I've found xdebug_get_headers(), should help us on tests. Now we don't need to monkey patch the header function in our test suites.

nickl- commented 12 years ago

It reports less than headers_list via mod_php:

xdebug_get_headers()
====================
Array
(
    [0] => Access-Control-Allow-Origin: *
    [1] => Access-Control-Allow-Headers: X-Requested-With
)
headers_list()
==============
Array
(
    [0] => X-Powered-By: PHP/5.3.14
    [1] => Access-Control-Allow-Origin: *
    [2] => Access-Control-Allow-Headers: X-Requested-With
)

But quite a bit more in CLI

xdebug_get_headers()
====================
Array
(
    [0] => Access-Control-Allow-Origin: *
    [1] => Access-Control-Allow-Headers: X-Requested-With
)
headers_list()
==============
Array
(
)

So what about preventing headers already sent though? Is ob_start enough for php_unit-iness?

alganet commented 12 years ago

xdebug_get_headers() also doesn't record status-line headers =/

PHPUnit has a @outputBuffering annotation that we can use. I believe it uses ob_start internally, but outside the test scope, which is good. I've used another PHPUnit tool, $this->expectOutputString on the isAutoDispatched tests and everything went fine.

After unit testing Respect\Rest, I believe we should centralize headers on the Request class. Both the Router and Routines can access it already so it should be easy to refactor things up.

nickl- commented 12 years ago

This is another perfect example why it makes sense to abstract the header manipulation into a utility class of our own, instead of calling header() directly.

We have gone full circle on this if you ask me:

@augustohp spotted the redundancy and duplication of effort suggesting we centralize the functionality instead.
We looked at other frameworks and they all abstracted header manipulation in one way or another.
We took an in-depth look at php's current as well as future plans and also realized that even what we have currently is flawed in so many ways.
we toyed with the idea of monkey-patching php which we realized fits outside the scope of Respect/Rest and quite arguably outside Respect as a whole if you weigh it against our mission statement, coming to think about it.
@alganet took on the task of ridding our tests of the nasties it contained in the past and made awesome discoveries to improve their presentation but alas, PHP's inability to retrieve the status code will not facilitate complete a thorough testability, without reintroducing the nasties, unless;
If we centralize the header manipulation into an utility class it will be mock-able hence total and complete transparency throughout our tests.

endel commented 10 years ago

+1 :)

augustohp commented 8 years ago

I've updated the issue description with a possible solution.

Respect / Rest

Better header checks #28

Original post

Another example:

RFC2616 6.1.1:

The functions provider for header manipulation:

Research results and findings

Output Buffering Control

Network Functions

verdict

functions

variables

objecs