edgi-govdata-archiving / versionista-outputter

ARCHIVED--A Ruby script that scrapes Versionista's web interface to generate a csv summarizing which websites and pages have had recent changes.
2 stars 0 forks source link

Add filtering step; create site-specific filters #1

Open titaniumbones opened 7 years ago

titaniumbones commented 7 years ago
titaniumbones commented 7 years ago

@atesgoral @jpmckinney @sonalranjit anyone else want to take a look at this?

titaniumbones commented 7 years ago

Looks like there is a standard diff library in Ruby: https://github.com/samg/diffy

atesgoral commented 7 years ago

For DOM-aware diffing: https://github.com/postmodern/nokogiri-diff