Open ebergam opened 8 years ago
That's a great idea, Enrico. I'll have to add this feature when I get time unless someone else forks the project and does it.
On Saturday, March 12, 2016, Enrico Bergamini notifications@github.com wrote:
Hey! Have you ever thought about generating an RSS feed? I think it'd be really useful! :) I can't help you because of my poor coding skills, but maybe it's not a big hustle. (a couple of months ago I created one on your website via scraping with kimonoLabs, but it has closed so it's no longer available) Thanks and good job with the website! Enrico
— Reply to this email directly or view it on GitHub https://github.com/chrismp/Data-Journalism-Jobs/issues/1.
Chris Persaud Web developer / Web scraper / Writer ChrisPersaud.com LinkedIn: ChrisMPersaud https://www.linkedin.com/in/chrismpersaud Twitter: @ChrisMPersaud http://twitter.com/chrismpersaud Github: chrismp https://github.com/chrismp
@chrismp it took one year, but I wrote a simple python script to generate an RSS feed. :)
import json
import requests, csv, datetime
import urllib2
from yattag import Doc
url = 'http://www.datajournalismjobs.com/jobs'
response = urllib2.urlopen(url)
data = json.load(response)
RSS_title = "DatajournalismJobs"
RSS_link = "http://www.datajournalismjobs.com"
RSS_description = "Don't miss updates from datajournalismjobs.com!"
titles_list = []
company_list = []
location_list = []
link_list = []
pubDate_list = []
raw_datalist = []
def clean_date(x):
try:
d = datetime.datetime.strptime(x.text, '%m/%d/%Y')
clean_d = d.strftime("%a, %d %b %Y %H:%M:%S %z +0200")
return clean_d
except Exception as e:
clean_d = datetime.date.today().strftime("%a, %d %b %Y %H:%M:%S %z +0200")
return clean_d
i = 0
while i < len(data):
titles_list.append(data[i]['jobTitle'])
company_list.append(data[i]['company'])
location_list.append(data[i]['jobLocation'])
link_list.append(data[i]['moreInfoURL'])
pubDate_list.append(clean_date(data[i]['submitted']))
i = i + 1
raw_datalist = zip(titles_list, company_list, location_list, link_list, pubDate_list)
#print raw_datalist
def returnlink(x):
if x == '':
x = 'http://www.datajournalismjobs.com'
return x
else:
return x
def generate_feed():
doc, tag, text, line = Doc().ttl()
doc.asis('<?xml version="1.0" encoding="UTF-8"?>')
with tag('rss',
('xmlns:atom', 'http://www.w3.org/2005/Atom'),
('version', '2.0')
):
with tag('channel'):
line('title', RSS_title)
line('link', RSS_link)
line('description', RSS_description)
line('language', 'en')
for row in raw_datalist:
with tag('item'):
line('title', row[0])
line('category', row[1], domain='company')
line('category', row[2], domain='location')
line('link', returnlink(row[3]))
line('pubDate', row[4])
print(doc.getvalue())
with open('datajournalismjobs_feed.xml','wf') as f:
f.write(doc.getvalue())
generate_feed()
You think you can make it run somewhere and generate the feed?
Hi @ebergam , Just saw this. Looks nice! My backend is Sinatra, a Ruby-based library. I'm not sure how I would implement this...
Hey! Have you ever thought about generating an RSS feed? I think it'd be really useful! :) I can't help you because of my poor coding skills, but maybe it's not a big hustle. (a couple of months ago I created one on your website via scraping with kimonoLabs, but it has closed so it's no longer available) Thanks and good job with the website! Enrico