webrecorder Search Results

1000+ results
for webrecorder

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

webrecorder/browsertrix-crawler #35

Collection name validation

If the desired collection name passed to browsertrix via `--collection` contains `.`, `/`, `:`, or potentially other special characters, `pywb` silently fails to create the necessary directory structu…

rebeccacremona updated 3 years ago
2
matrix-org/synapse #9733

Youtube captions (link previews) are useless

### Description At some point Youtube has updated the site and now all (?) captions generated by Synapse for the site are: Before you continue to YouTube Sign in a Google company Before you con…

eras updated 3 years ago
45
nfriedly/node-unblocker #117

Are there any other unblockers you know of?

Vikingpress got blocked in our school so please please help!

neelansh2489 updated 3 years ago
6
webrecorder/archiveweb.page #21

Issue rendering filterable content

Hello! Thanks for your great work on this project. We have recently switched from using Webrecorder desktop and I'm enjoying exploring archiveweb.page. I hope this question isn't too specific, …

jakebickford updated 3 years ago
3
webrecorder/browsertrix-crawler #36

Windows 1251 (cyrillic) encoded text incorrectly encoded

When scraping a website encoded in Windows Cyrillic (windows-1251), the convertion to UTF-8 is faulty, resulting in tons of `пїЅпїЅпїЅпїЅпїЅ` strings. - Sample website: https://sattvinfo.net/ - Sa…

rgaudin updated 3 years ago
3
webrecorder/archiveweb.page #30

Add a smart dedupe feature

If one is recording a site of video content (especially video content which repeats upon, say, a reload or clicking on the link again), the files become huge. Having the ability to intelligently dedup…

deltabravozulu updated 3 years ago
2
iipc/openwayback #439

indexing one .warc file need help

I have successfully configured openwayback but I am confused where should I put the .warc file and how can I access it then?

mashalahmad updated 3 years ago
13
internetarchive/heritrix3 #351

Heritrix 3.3 out-of-the-box archives pages with meta noindex

Installed Heritrix 3.3.0 on a Linux server. (3.4.0 fails consistently when editing a configuration.) Out-of-the-box configuration, just set the seed and the operatorContactUrl. I tell it to crawl…

wroth updated 3 years ago
5
arquivo/pwa-technologies #791

CDX output below expected

Hi, I'm trying to collect old news and the problem is that the API response to my query results in low outputs. I show bellow an example of a news article published by Público in March 2012, availab…

miguelwon updated 3 years ago
4
webrecorder/browsertrix-crawler #4

Videos missing

Investigating https://github.com/openzim/zimit/issues/71 I realized I can't seem to be able to scrape videos reliably with the current version. Even a very simple tests doesn't work: - https://w…

rgaudin updated 3 years ago
7

上一页 1...94 95 96 97 98 99 100...100 下一页

1000+ results for webrecorder

1000+ results
for webrecorder