Open adclose opened 7 years ago
Thanks. It's useful to give a minimal test that reproduces this:
url1 url2 ... urln
and the command you used to run it.
Does this bug happen every time? Operating system? minimal test - what happens with just one url?
From windows powershell
quickscrape --urllist Lawyers.txt --scraper divLawyers.json --outformat bibjson --output .
info: quickscrape 0.4.7 launched with... info: - URLs from file: undefined info: - Scraper: C:\Programming\NodeJsTest\nodeminer\test\divLawyers.json info: - Rate limit: 3 per minute info: - Log level: info info: urls to scrape: 2 info: processing URL: https://members.collaborativedivorcetexas.com/cdtxprofessional/lauren-duffer/ info: [scraper]. URL rendered. https://members.collaborativedivorcetexas.com/cdtxprofessional/lauren-duffer/. info: URL processed: captured 10/12 elements (2 captures failed) info: processing URL: https://members.collaborativedivorcetexas.com/cdtxprofessional/anita-savage/ info: all tasks completed
Sites Mined https://members.collaborativedivorcetexas.com/cdtxprofessional/lauren-duffer/ https://members.collaborativedivorcetexas.com/cdtxprofessional/anita-savage/
.json file
{ "url": "collaborativedivorcetexas.com", "elements": { "link":{ "selector": "//div[@class='fullName']/a", "attribute": "text" }, "firstName":{ "selector": "//div[@class='firstName']", "attribute": "text" }, "lastName":{ "selector": "//div[@class='lastName']", "attribute": "text" }, "email":{ "selector": "//div[@class='email']/a", "attribute": "href" }, "website":{ "selector": "//div[@class='website']/a", "attribute": "href" }, "firm":{ "selector": "//div[@class='firmName']", "attribute": "text" }, "street1":{ "selector": "//div[@class='streetAddress1']", "attribute": "text" }, "street2":{ "selector": "//div[@class='streetAddress2']", "attribute": "text" }, "city":{ "selector": "//div[@class='city']", "attribute": "text" }, "state":{ "selector": "//div[@class='state']", "attribute": "text" }, "zip":{ "selector": "//div[@class='zipCode']", "attribute": "text" }, "phone":{ "selector": "//div[@class='phoneNumber']", "attribute": "text" }
} }
Happens every time from what I can tell does this every time it loops over multiple files.
Doesn't seem to happen with one file.
I've noticed that the last URL I have on a UrlList does not get scraped.
have others seen this?
Aaron