issues
search
ankurjain0985
/
crawler4j
Automatically exported from code.google.com/p/crawler4j
0
stars
0
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Restricting crawler to crawl within the given page domain
#342
GoogleCodeExporter
opened
9 years ago
0
Unable to download RTF files using crawler4j
#341
GoogleCodeExporter
opened
9 years ago
0
All links on a page are not recognized
#340
GoogleCodeExporter
opened
9 years ago
0
Crawler is slow
#339
GoogleCodeExporter
opened
9 years ago
0
Crawler time delay when exiting
#338
GoogleCodeExporter
opened
9 years ago
0
Memory usage
#337
GoogleCodeExporter
opened
9 years ago
0
Hanging on file process
#336
GoogleCodeExporter
closed
9 years ago
4
robots.txt isn't crawled
#335
GoogleCodeExporter
closed
9 years ago
4
Crawling over disallowed paths from robots.txt
#334
GoogleCodeExporter
closed
9 years ago
1
Crawlers storage folder taking up too much space.
#333
GoogleCodeExporter
closed
9 years ago
4
Jar with dependencies
#332
GoogleCodeExporter
opened
9 years ago
0
Slf4j libraries missing on some configurations
#331
GoogleCodeExporter
closed
9 years ago
1
Proxy information get lost when using basic authentication
#330
GoogleCodeExporter
opened
9 years ago
3
FileNotFoundException: .m2\repository\edu\uci\ics\crawler4j\4.0\crawler4j-4.0.jar!\tld-names.zip
#329
GoogleCodeExporter
closed
9 years ago
8
Scraping iframes, base64,vb scripts
#328
GoogleCodeExporter
closed
9 years ago
6
crawler fail due to http 303 see other
#327
GoogleCodeExporter
closed
9 years ago
8
PageFetcher is unreadable
#326
GoogleCodeExporter
closed
9 years ago
1
emove Non official http status codes
#325
GoogleCodeExporter
closed
9 years ago
1
NullPointerException when crawling links with no HREF
#324
GoogleCodeExporter
closed
9 years ago
1
Upgrade CustomFetchStatus
#323
GoogleCodeExporter
closed
9 years ago
1
WebCrawler should throw exceptions instead of returning at the middle of the method
#322
GoogleCodeExporter
closed
9 years ago
2
Change logging from System.out.print to SLF4J
#321
GoogleCodeExporter
closed
9 years ago
1
HttpResponse response = httpClient.execute(get) in PageFetcher has no Timeout
#320
GoogleCodeExporter
closed
9 years ago
2
[Enhancement] Sitemaps should be supported in a enhanced way
#319
GoogleCodeExporter
opened
9 years ago
1
Sitemaps that are gziped are ignored
#318
GoogleCodeExporter
opened
9 years ago
1
text parsers aren't looking for links in content thus shouldVisit is never called
#317
GoogleCodeExporter
closed
9 years ago
3
Sitemaps with content-type text/xml are ignored
#316
GoogleCodeExporter
closed
9 years ago
7
Patch for /src/test/java/edu/uci/ics/crawler4j/examples/basic/BasicCrawler.java
#315
GoogleCodeExporter
closed
9 years ago
3
Crawl Site Maps
#314
GoogleCodeExporter
closed
9 years ago
1
Waiting 30sec before cleaning everything
#313
GoogleCodeExporter
opened
9 years ago
1
Crawler should follow links in plain text files
#312
GoogleCodeExporter
closed
9 years ago
1
Add functionality to retrieve links from binary and text only files
#311
GoogleCodeExporter
closed
9 years ago
1
How Can I download the entire html code for a page in .html file?
#310
GoogleCodeExporter
closed
9 years ago
2
Adding seeds to crawler4j at runtime
#309
GoogleCodeExporter
opened
9 years ago
0
Remove the Language Identifier
#308
GoogleCodeExporter
closed
9 years ago
1
Does it crawl every site only site? My crawler is not crwaling after 355 sites
#307
GoogleCodeExporter
closed
9 years ago
11
The method shouldVisit(Page, WebURL) of type BasicCrawler must override or implement a supertype method
#306
GoogleCodeExporter
closed
9 years ago
3
Count of crawl cycles
#305
GoogleCodeExporter
closed
9 years ago
6
Crawl Site Maps
#304
GoogleCodeExporter
closed
9 years ago
4
Create a log configuration file default
#303
GoogleCodeExporter
closed
9 years ago
2
Update deprecated methods/classes in PageFetcher
#302
GoogleCodeExporter
closed
9 years ago
2
Upgrade the Pattern constant on the crawler examples
#301
GoogleCodeExporter
closed
9 years ago
3
Resumable deletes all folder content not databases
#300
GoogleCodeExporter
closed
9 years ago
3
NullPointerException when trying to crawl different URLs
#299
GoogleCodeExporter
closed
9 years ago
1
Fatal Transport Error when crawling robots.txt
#298
GoogleCodeExporter
closed
9 years ago
2
Add tag name to WebUrl
#297
GoogleCodeExporter
closed
9 years ago
1
Threads not being killed in graceful shutdown
#296
GoogleCodeExporter
closed
9 years ago
13
Add meta tags into the parsed html object
#295
GoogleCodeExporter
closed
9 years ago
1
Save the TLD list as a compressed file
#294
GoogleCodeExporter
closed
9 years ago
1
Grab the TLD list from the online URL
#293
GoogleCodeExporter
closed
9 years ago
2
Next