issues
search
alan-turing-institute
/
misinformation-crawler
Web crawler to collect snapshots of articles to web archive
MIT License
5
stars
2
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Tab delimited article input files
#362
dongpng
opened
4 years ago
0
Allow crawler to run against specific URLs
#361
jemrobinson
closed
4 years ago
0
Azure dependencies of the crawler
#360
dongpng
opened
4 years ago
0
Blockblobservice Error
#359
dongpng
opened
4 years ago
0
Command line tool for crawling specific URLs
#358
dongpng
closed
4 years ago
0
README: Dependencies for installation
#357
dongpng
opened
4 years ago
1
Switch washingtontimes to sitemap crawl and add extra date format
#356
edwardchalstrey1
closed
5 years ago
1
only update cookies when present
#355
edwardchalstrey1
closed
5 years ago
1
Fix centerforsecuritypolicy
#354
jemrobinson
closed
5 years ago
0
ReadabiliPy breaking on centerforsecuritypolicy.org
#353
jemrobinson
closed
5 years ago
0
more articles from apnews with sitemap
#352
edwardchalstrey1
closed
5 years ago
2
continue crawling index pages conservativepapers
#351
edwardchalstrey1
closed
5 years ago
0
add extra urls weeklyworldnews
#350
edwardchalstrey1
closed
5 years ago
0
add extra urls clickhole
#349
edwardchalstrey1
closed
5 years ago
0
add us news and world news categories realnewsrightnow
#348
edwardchalstrey1
closed
5 years ago
1
Check sites with low article count
#347
edwardchalstrey1
closed
5 years ago
0
Reconsider button clicking strategy
#346
jemrobinson
closed
5 years ago
0
Missing sites from database
#345
edwardchalstrey1
closed
5 years ago
2
add paper paragraphs
#344
edwardchalstrey1
opened
5 years ago
0
Update dailykos match rules
#343
edwardchalstrey1
closed
5 years ago
0
Update date extraction for redstate.com
#342
edwardchalstrey1
closed
5 years ago
1
Missing dates for some redstate.com articles
#341
edwardchalstrey1
closed
5 years ago
1
update byline xpath nationalreview
#340
edwardchalstrey1
closed
5 years ago
0
Missing byline for some nationalreview.com articles
#339
edwardchalstrey1
closed
5 years ago
0
madpatriots.com appears to have disappeared
#338
edwardchalstrey1
opened
5 years ago
0
eyeopening.info appears to no longer exist
#337
edwardchalstrey1
opened
5 years ago
0
update byline xpath denverpost
#336
edwardchalstrey1
closed
5 years ago
0
Denverpost has some articles with bylines missing
#335
edwardchalstrey1
closed
5 years ago
0
Missing metadata
#334
edwardchalstrey1
closed
5 years ago
0
Politico bylines sub-optimal
#333
edwardchalstrey1
closed
5 years ago
0
Fix vox article format
#332
jemrobinson
closed
5 years ago
0
vox.com extraction issues
#331
jemrobinson
closed
5 years ago
0
Fix vanityfair.com
#330
jemrobinson
closed
5 years ago
0
vanityfair.com extraction issues
#329
jemrobinson
closed
5 years ago
0
Updated ReadabiliPy and added test for breaking page
#328
jemrobinson
closed
5 years ago
0
ReadabiliPy crash on centerforsecuritypolicy.org
#327
jemrobinson
closed
5 years ago
0
dailykos extraction issues
#326
jemrobinson
closed
5 years ago
0
Crash when interpreting article from breitbart
#325
jemrobinson
closed
5 years ago
1
Switch to updated version of ReadabiliPy with fixed Breitbart issue
#324
jemrobinson
closed
5 years ago
0
Fix button pressing
#323
jemrobinson
closed
5 years ago
0
denverpost.com using unnecessary button
#322
jemrobinson
closed
5 years ago
0
Increase output verbosity
#321
jemrobinson
closed
5 years ago
0
Fix button pressing for time.com
#320
jemrobinson
closed
5 years ago
0
Button pressing broken on time.com
#319
jemrobinson
closed
5 years ago
0
Get all article dates politico
#318
edwardchalstrey1
closed
5 years ago
0
Politico has some articles with missing dates
#317
edwardchalstrey1
closed
5 years ago
0
Fix missing bylines christianpost
#316
edwardchalstrey1
closed
5 years ago
0
Some Christianpost articles missing byline
#315
edwardchalstrey1
closed
5 years ago
0
Fix dailycaller.com
#314
edwardchalstrey1
closed
5 years ago
2
Daily caller config needs updating
#313
edwardchalstrey1
closed
5 years ago
0
Next