chrismp / Data-Journalism-Jobs

Data journalism jobs listings
8 stars 4 forks source link

Is an RSS feed possible? #1

Open ebergam opened 8 years ago

ebergam commented 8 years ago

Hey! Have you ever thought about generating an RSS feed? I think it'd be really useful! :) I can't help you because of my poor coding skills, but maybe it's not a big hustle. (a couple of months ago I created one on your website via scraping with kimonoLabs, but it has closed so it's no longer available) Thanks and good job with the website! Enrico

chrismp commented 8 years ago

That's a great idea, Enrico. I'll have to add this feature when I get time unless someone else forks the project and does it.

On Saturday, March 12, 2016, Enrico Bergamini notifications@github.com wrote:

Hey! Have you ever thought about generating an RSS feed? I think it'd be really useful! :) I can't help you because of my poor coding skills, but maybe it's not a big hustle. (a couple of months ago I created one on your website via scraping with kimonoLabs, but it has closed so it's no longer available) Thanks and good job with the website! Enrico

— Reply to this email directly or view it on GitHub https://github.com/chrismp/Data-Journalism-Jobs/issues/1.

Chris Persaud Web developer / Web scraper / Writer ChrisPersaud.com LinkedIn: ChrisMPersaud https://www.linkedin.com/in/chrismpersaud Twitter: @ChrisMPersaud http://twitter.com/chrismpersaud Github: chrismp https://github.com/chrismp

ebergam commented 7 years ago

@chrismp it took one year, but I wrote a simple python script to generate an RSS feed. :)

import json
import requests, csv, datetime
import urllib2
from yattag import Doc

url = 'http://www.datajournalismjobs.com/jobs' 
response = urllib2.urlopen(url)
data = json.load(response)

RSS_title = "DatajournalismJobs"
RSS_link = "http://www.datajournalismjobs.com"
RSS_description = "Don't miss updates from datajournalismjobs.com!"

titles_list = []
company_list = []
location_list = []
link_list = []
pubDate_list = []
raw_datalist = []

def clean_date(x):
        try:
            d = datetime.datetime.strptime(x.text, '%m/%d/%Y')
            clean_d = d.strftime("%a, %d %b %Y %H:%M:%S %z +0200")
            return clean_d
        except Exception as e:
            clean_d = datetime.date.today().strftime("%a, %d %b %Y %H:%M:%S %z +0200")
            return clean_d

i = 0
while i < len(data):
    titles_list.append(data[i]['jobTitle'])
    company_list.append(data[i]['company'])
    location_list.append(data[i]['jobLocation'])
    link_list.append(data[i]['moreInfoURL'])
    pubDate_list.append(clean_date(data[i]['submitted']))
    i = i + 1

raw_datalist = zip(titles_list, company_list, location_list, link_list, pubDate_list)
#print raw_datalist

def returnlink(x):
    if x == '':
        x = 'http://www.datajournalismjobs.com'
        return x
    else:
        return x

def generate_feed():
            doc, tag, text, line = Doc().ttl()
            doc.asis('<?xml version="1.0" encoding="UTF-8"?>')
            with tag('rss',
                ('xmlns:atom', 'http://www.w3.org/2005/Atom'),
                ('version', '2.0')
                ):
                    with tag('channel'):
                        line('title', RSS_title)
                        line('link', RSS_link)
                        line('description', RSS_description)
                        line('language', 'en')
                        for row in raw_datalist:
                            with tag('item'):
                                line('title', row[0])
                                line('category', row[1], domain='company')
                                line('category', row[2], domain='location')
                                line('link', returnlink(row[3]))
                                line('pubDate', row[4])
            print(doc.getvalue())
            with open('datajournalismjobs_feed.xml','wf') as f:
                f.write(doc.getvalue())

generate_feed() 

You think you can make it run somewhere and generate the feed?

chrismp commented 7 years ago

Hi @ebergam , Just saw this. Looks nice! My backend is Sinatra, a Ruby-based library. I'm not sure how I would implement this...