heritrix Search Results

582 results
for heritrix

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

netarchivesuite/solrwayback #192

No search results

Hi, I have some warc files created using [warcit](https://github.com/webrecorder/warcit). Somehow after indexing (without errors or warnings), I can't find any page included in it on SolrWayback. I…

mbreemhaar updated 2 years ago
23
WebCuratorTool/webcurator #38

wct 3.0.3 does not propagate the operatorContactUrl to Herit…

We've set up WCT and set the operator contact URL in the profile, however this data does not seem to propagate to the heritrix job configuration. I've attached four screenshots. Any idea what the prob…

vitezg updated 2 years ago
8
webrecorder/replayweb.page #65

out of memory

For an in company use of webarchives, I'm experimenting transforming older Heritrix crawls to wacz (thanks to py-wacz). One of these transforms results in 48GB and reports 325.000 pages. Using this …

robert-1043 updated 2 years ago
5
internetarchive/heritrix3 #425

dnsjava NIO selector thread stuck at 100% after terminating …

Using the latest version (20210803) and a lot of versions before that, when the job is terminated, one CPU thread seems to be stuck at 100% doing nothing. This never goes away until I restart Heritrix…

jmvezic updated 2 years ago
3
ukwa/ukwa-manage #78

Document Harvester issues

- user agent getting blocked - having problems with some characters in URLs.

anjackson updated 2 years ago
3
ukwa/webrender-puppeteer #17

Write rendered versions to WARCs

Use warcio.js to write rendered versions to WARCs rather than pushing to the proxy (which limits us to using warcprox). make sure WARC records are the same as under warcprox implementation Rota…

anjackson updated 3 years ago
1
internetarchive/heritrix3 #437

[Question] SEVERE Configuration problem: Unable to locate S…

I am trying to extend heritrix, i have configured my pom.xml like this to build a single JAR with all the heritrix dependencies ``` 4.0.0 io.test extended-heritrix 1.0-SNA…

naveen17797 updated 3 years ago
4
arquivo/pwa-technologies #1150

Update User Agent from crawlers

- [ ] Change Heritrix User Agent --> "Arquivo-web-crawler (compatible; heritrix/3.4.0-20200304 +https://arquivo.pt/faq-crawling)" - [ ] Add User Agent to Arquivo Patcher - [ ] Update http://arquivo.…

PedroG1515 updated 3 years ago
1
ukwa/ukwa-heritrix #74

Broken checkpoints

``` SEVERE: org.archive.crawler.framework.CrawlJob beansException Failed to start bean 'warcWriterViralOld'; nested exception is java.lang.RuntimeException: java.io.FileNotFoundException: File '/heri…

anjackson updated 3 years ago
1
internetarchive/heritrix3 #417

module java.base does not export sun.security.tools.keytool …

Error happens with latest openjdk 16.0.1. Works fine with LTS version (openjdk 11.0.11) . ``` Sat Jul 24 09:57:57 PM EEST 2021 Starting heritrix Linux f34 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul…

Pernat1y updated 3 years ago
1

上一页 1...19 20 21 22 23 24 25...59 下一页

582 results for heritrix

582 results
for heritrix