giabby / rssingest

Automatically exported from code.google.com/p/rssingest
0 stars 0 forks source link

Cleaning ingest #6

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
____This script looks like it could be really useful for editing an RSS feed 
generated by Eventlist for Joomla:

http://methodist-central-hall.org.uk/index.php?option=com_eventlist&view=eventli
st&format=feed&type=atom

____I've set the code up like this (obviously the username/password/key are not 
the actual ones):

<?php

//require_once("dbCon/dbcon.php");

$db_hostname="localhost";
$db_username="thisismypasswork";
$db_password="thisismypassword";

$private_access_key="thisismyprivatekey";

// Check a few bits and pieces

if(isset($_GET['feed_url']))
{
    $feed_url = $_GET['feed_url'];
}
else
{
    die("Need to pass the (consistent) 'feed url'");
}

if(isset($_GET['access_key']))
{

    if($_GET['access_key']==$private_access_key)
    {
        echo "Access key correct, proceeding...<br/><br/>";
    }
    else
    {
        die("wrong access key");
    }
}
else
{
    die("Need to pass the 'access_key' URL parameter");
}

try
{
    /*  query the database */
    // $db = getCon();

    $db = mysql_connect($db_hostname,$db_username,$db_password);
    if (!$db)
    {
        die("Could not connect: " . mysql_error());
    }
    mysql_select_db("your_db", $db);

    echo "Starting to work with feed URL '" . $feed_url . "'";

    /* Parse XML from  http://sanctuary-westminster.org/server/in.php?feed_url=http://methodist-central-hall.org.uk/index.php?option=com_eventlist&view=eventlist&format=feed&type=atom */
    //$RSS_DOC = simpleXML_load_file('http://sanctuary-westminster.org/server/in.php?feed_url=http://methodist-central-hall.org.uk/index.php?option=com_eventlist&view=eventlist&format=feed&type=atom');

    libxml_use_internal_errors(true);
    $RSS_DOC = simpleXML_load_file($feed_url);
    if (!$RSS_DOC) {
        echo "Failed loading XML\n";
        foreach(libxml_get_errors() as $error) {
            echo "\t", $error->message;
        }
    }

    /* Get title, link, managing editor, and copyright from the document  */
    $rss_title = $RSS_DOC->channel->title;
    $rss_link = $RSS_DOC->channel->link;
    $rss_editor = $RSS_DOC->channel->managingEditor;
    $rss_copyright = $RSS_DOC->channel->copyright;
    $rss_date = $RSS_DOC->channel->pubDate;

    //Loop through each item in the RSS document

    foreach($RSS_DOC->channel->item as $RSSitem)
    {

        $item_id    = md5($RSSitem->title);
        $fetch_date = date("Y-m-j G:i:s"); //NOTE: we don't use a DB SQL function so its database independant
        $item_title = $RSSitem->title;
        $item_date  = date("Y-m-j G:i:s", strtotime($RSSitem->pubDate));
        $item_url   = $RSSitem->link;

        echo "Processing item '" , $item_id , "' on " , $fetch_date     , "<br/>";
        echo $item_title, " - ";
        echo $item_date, "<br/>";
        echo $item_url, "<br/>";

        // Does record already exist? Only insert if new item...

        $item_exists_sql = "SELECT item_id FROM rssingest where item_id = '" . $item_id . "'";
        $item_exists = mysql_query($item_exists_sql, $db);
        if(mysql_num_rows($item_exists)<1)
        {
            echo "<font color=green>Inserting new item..</font><br/>";
            $item_insert_sql = "INSERT INTO rssingest(item_id, feed_url, item_title, item_date, item_url, fetch_date) VALUES ('" . $item_id . "', '" . $feed_url . "', '" . $item_title . "', '" . $item_date . "', '" . $item_url . "', '" . $fetch_date . "')";
            $insert_item = mysql_query($item_insert_sql, $db);
        }
        else
        {
            echo "<font color=blue>Not inserting existing item..</font><br/>";
        }

        echo "<br/>";
    }

    // End of form //
} catch (Exception $e)
{
    echo 'Caught exception: ',  $e->getMessage(), "\n";
}
?>

____I then run the URL: 

http://server/rss.php?feed_url=http://methodist-central-hall.org.uk/index.php?op
tion=com_eventlist&view=eventlist&format=feed&type=atom&access_key=thisismypriva
tekey

_____And get the following message:

Access key correct, proceeding...

Starting to work with feed URL 
'http://methodist-central-hall.org.uk/index.php?option=com_eventlist''Failed 
loading XML Opening and ending tag mismatch: span line 497 and a Opening and 
ending tag mismatch: li line 494 and span Opening and ending tag mismatch: ul 
line 493 and li Opening and ending tag mismatch: span line 504 and a Opening 
and ending tag mismatch: li line 501 and span Opening and ending tag mismatch: 
div line 492 and li Opening and ending tag mismatch: span line 511 and a 
Opening and ending tag mismatch: li line 508 and span Opening and ending tag 
mismatch: div line 488 and li Opening and ending tag mismatch: span line 520 
and a Opening and ending tag mismatch: li line 517 and span Opening and ending 
tag mismatch: div line 487 and li Opening and ending tag mismatch: div line 486 
and ul Opening and ending tag mismatch: body line 62 and div Opening and ending 
tag mismatch: html line 2 and div Extra content at the end of the document

____I'm guessing I need to sanitize the input stream, but I'm not sure the best 
way to do that, or how to edit the script accordingly. 
Thanks for the work.

Original issue reported on code.google.com by feyi...@gmail.com on 12 Jun 2013 at 2:39