Closed andehr closed 3 years ago
@andehr could you put implementations in a service? I'm trying to keep the ShellRunner class clean, it just marshals arguments to implementations and shouldn't have any business logic in it...
Sure, I wasn't sure how much of the logic of quite a specific command should be put in general services. So maybe just add a rescrape function to the scraperService which delegates to the datetimeService whenever needed?
@andehr Would have thought the scraper service would be a good place...
ACLEDTagger
now defines classDomTaggerOpenAccess
which is aDOMTagger
with public access to its tagging methodtag()
which also requires only a String of html. You can acquire an instance of the tagger like this:scraperService.getScraper(source, scraperDir)
.DateTimeService
now provides aparseDate(scrapedDate, source)
function which re-parses a scraped date given a Source config. The service also has a simpleisInRange(articleDate, from, to)
.ShellRunner
has a new console command "re-scrape", which finds all articles for a specific Source, and using their raw html field, will re-scrape and update their scraped fields.re-scrape -s "MiMorelia" -sd /path/to/scrapers
Article.DATE
) if a scrape occurred.-f
and-t
). The article's parsed date is used for this boundary check. If the article doesn't have one, then it's included in the re-scrape anyway.