Open evcordeiro opened 13 years ago
The previous bug that was causing the oAuth error with tumblr was that the content of the post contained some html tags and characters that tumblr did not recognize. So in the index.php I added: $information['content'] = preg_replace("/[^a-zA-Z0-9\s]/", "", strip_tags(stripHTML($sitemap->entry[$count]->content))); to strip out what tumblr didn't accept. It should just need to be tweaked to fix this bug.
i noticed that. what exactly was the error tumblr was giving us? i looked at the tumblr api and it seems that 'body' should be able to accept most characters, it can take html. what about playing with the 'format' variable?
regardless, i think it would be better to move the regex and striptag stuff into the plugin code. that way each plugin can do what it needs to with the tags. for instance parsing tags to include pictures in facebook and tumblr posts
To bring this thread up to date, function parseFeed() in index.php is the beginning of this issue
the parse begins here: $xmlstr = file_get_contents($query['urlid']);
$sitemap = simplexml_load_string($xmlstr);
then makes its way to the plugins (/plugins/sno_*.php) function postToAPI()
there is a lot of experimental mucking about in the tumblr plugin, but heres a bit of it
//echo "unmod content:<br>";
//echo "<pre>" . $information['content'] . "</pre>";
//echo "<br><br>striptags htmlentititydecode<br><br>";
//$cont = htmlentities($information['content'], ENT_QUOTES | ENT_IGNORE );
//echo $cont;
//echo (strip_tags(html_entity_decode($cont, ENT_QUOTES)));
/*
echo (strip_tags(html_entity_decode($information['content'], ENT_NOQUOTES, 'ISO-8859-1')));
echo (strip_tags(html_entity_decode($information['content'], ENT_QUOTES, 'ISO-8859-15')));
echo (strip_tags(html_entity_decode($information['content'], ENT_COMPAT, 'UTF-8')));
*/
My thoughts:
Starting from the top is the best, each plugin should be passed an -unmodified- feed. By unmodified I mean an arbitrary standard format but no data (such as tags) removed. Non standard stuff (esp quotes) need to be handled at this level.
There is a lot of stuff we can use, I looked briefly at php's xml_parser_create() and xmlset*, it might be a good place to start. pregreplace to delete non standards should ideally be used not at all on this level, but if we need to bug report it.
I noticed an issue specifically on tumblr posts, somewhere in the parse we are converting non alphabet characters, for example:
avid is the kid8217s English name He laughs every time I try to pronounce his real name but he can8217t say mine either And besides he8217s the one killing me off on a regular basis At first it was teacher die After weeks of hard work though he8217s grasped that teacher dies The 8216s8217 David remember the s Recen