issues
search
ankurjain0985
/
crawler4j
Automatically exported from code.google.com/p/crawler4j
0
stars
0
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Optimize the TLD list
#292
GoogleCodeExporter
closed
9 years ago
1
HtmlParseData should hold a unique list of URLs
#291
GoogleCodeExporter
closed
9 years ago
2
We should support all redirect status codes
#290
GoogleCodeExporter
closed
9 years ago
1
Parsing a binary content shouldn't throw a general parsing error
#289
GoogleCodeExporter
closed
9 years ago
1
Upgrade Unit Tests to v4
#288
GoogleCodeExporter
closed
9 years ago
1
Webcrawler freeze when server is not available
#287
GoogleCodeExporter
closed
9 years ago
7
Disable Robots not working correclty?
#286
GoogleCodeExporter
opened
9 years ago
9
WebURL.java causes IndexOutOfBoundException
#285
GoogleCodeExporter
closed
9 years ago
3
Cathing any exception and hidding the log.
#284
GoogleCodeExporter
closed
9 years ago
7
Refresh Interval
#283
GoogleCodeExporter
closed
9 years ago
2
Add CHANGES.TXT with the changelog to the root
#282
GoogleCodeExporter
closed
9 years ago
1
Upgrade try-catch to java7 "try with resources"
#281
GoogleCodeExporter
closed
9 years ago
2
Use Tika's MediaTypes instead of self parsing strings
#280
GoogleCodeExporter
opened
9 years ago
1
TikaException is thrown while crawling several PDFs in a row
#279
GoogleCodeExporter
closed
9 years ago
1
Add hooks in the webcrawler for better error handling
#278
GoogleCodeExporter
closed
9 years ago
2
Update/Delete URLs, functionality
#277
GoogleCodeExporter
closed
9 years ago
3
Don't let a crawled URL to be dropped without proper logging
#276
GoogleCodeExporter
closed
9 years ago
1
Crawl Duplicate URLs
#275
GoogleCodeExporter
closed
9 years ago
8
[deleted issue]
#274
GoogleCodeExporter
closed
9 years ago
0
Tabbing looks messed up in several places
#273
GoogleCodeExporter
closed
9 years ago
2
Parse Binary Content
#272
GoogleCodeExporter
closed
9 years ago
0
Huge throughput improvement
#271
GoogleCodeExporter
opened
9 years ago
2
Crowler doesn't crawl some page.
#270
GoogleCodeExporter
closed
9 years ago
5
spacing removed after htmlParseData.getText();
#269
GoogleCodeExporter
opened
9 years ago
1
Fatal transport error when using a proxy
#268
GoogleCodeExporter
opened
9 years ago
1
Anchor Text still null
#267
GoogleCodeExporter
closed
9 years ago
2
How to calculate url pagerank
#266
GoogleCodeExporter
closed
9 years ago
2
putting Selenium Code into CrawlController --> Exception in thread "main" java.lang.NoSuchFieldError: INSTANCE
#265
GoogleCodeExporter
closed
9 years ago
3
Unable to shutdown crawler after server errors.
#264
GoogleCodeExporter
closed
9 years ago
1
Now Seeding Wordpress Hosted Websites
#263
GoogleCodeExporter
closed
9 years ago
5
Not Visiting Certain Seed Urls
#262
GoogleCodeExporter
closed
9 years ago
3
Crawler4j missing more control over retry count
#261
GoogleCodeExporter
opened
9 years ago
1
Environment daemon threads keep running after CrawlController.shutdown()
#260
GoogleCodeExporter
closed
9 years ago
10
HtmlParseData.getText() doesn't recognize breaks or paragraphs
#259
GoogleCodeExporter
opened
9 years ago
1
crawler4j as servlet
#258
GoogleCodeExporter
closed
9 years ago
1
crawler4j dont crawl some sites
#257
GoogleCodeExporter
closed
9 years ago
4
Remove Hard-Coded Sleeps
#256
GoogleCodeExporter
opened
9 years ago
1
Many URLs are discarded / not processed(missing in output)
#255
GoogleCodeExporter
closed
9 years ago
2
Quartz scheduler + crawler4J http connection error
#254
GoogleCodeExporter
opened
9 years ago
0
Fatal Transport Error: Read timeout while fetching from same host multiple times
#253
GoogleCodeExporter
closed
9 years ago
6
Patch for /src/main/java/edu/uci/ics/crawler4j/parser/HtmlContentHandler.java
#252
GoogleCodeExporter
opened
9 years ago
1
Fix a typo
#251
GoogleCodeExporter
closed
9 years ago
2
How to do NTLM Authentication ?
#250
GoogleCodeExporter
opened
9 years ago
5
UnsupportedClassVersionError / Unsupported
#249
GoogleCodeExporter
closed
9 years ago
6
Crawling for specific Number (EANs Eurpoean Article Numbers)
#248
GoogleCodeExporter
opened
9 years ago
2
Errors during crawling (maybe regarding robots.txt)
#247
GoogleCodeExporter
closed
9 years ago
21
Storing Videos a problem
#246
GoogleCodeExporter
closed
9 years ago
1
Provide easy access to (absolute) canonical URL
#245
GoogleCodeExporter
opened
9 years ago
1
Automatically increase politeness delay if received 420 or 429 HTTP code
#244
GoogleCodeExporter
opened
9 years ago
1
Deleting crawl storage folder after crawling?
#243
GoogleCodeExporter
closed
9 years ago
1
Previous
Next