commoncrawl Search Results

891 results
for commoncrawl

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Kaggle/docker-python #689

add pip installable package warcio to docker image

Hi kaggle team, would be great to have this python package available, https://github.com/webrecorder/warcio which is used to read the Web ARChive format which is used by Common Crawl to store t…

galtay updated 4 years ago
1
explosion/spaCy #5149

Unable to create custom Spacy model w/ Ontonotes that matche…

Hello- I am trying to recreate the en_core_web_lg model https://github.com/explosion/spacy-models/releases//tag/en_core_web_lg-2.2.5 model by following the steps in the model description and assets fo…

zredlined updated 4 years ago
4
Helsinki-NLP/OPUS-MT-train #7

[Language Codes] How are models named?

For example, In, `cmn+cn+yue+ze_zh+zh_cn+zh_CN+zh_HK+zh_tw+zh_TW+zh_yue+zhs+zht+zh-de ` Is there a table or some other source for what zh_HK, zh_yue, yue, etc. represent? Is zh_yue is different…

sshleifer updated 4 years ago
6
cc-archive/cccatalog #302

[API Integration] ScienceMuseum

Note that we are currently integrated via CommonCrawl, but would like to switch to an API integration. Also note that we previously had issues getting correct license information from the page, and as…

annatuma updated 4 years ago
35
projectdiscovery/subfinder #167

[Enhancement] Take last commoncrawl index

Hello, It's more a suggestion than an issue. I have recently installed subfinder and as a passive source, i saw ```commoncrawl```. However, subfinder is requesting the following index: …

phackt updated 4 years ago
2
bitextor/bitextor #163

ExternalTextProcessor: Pretend to care about efficiency

So far as I can tell https://github.com/bitextor/bitextor/blob/master/bitextor-tokenize.py and https://github.com/bitextor/bitextor/blob/master/bitextor-tokenize-moses.py launch new processes for ever…

kpu updated 4 years ago
2
Podcastindex-org/podcast-namespace #44

Need function comparable to robots.txt, but to guide podcast…

This namespace element was motivated by a situation where a podcast approached podcastindex, noting that they had _two_ feeds for their podcast (one for most people, another motivated by their Chinese…

vandys updated 4 years ago
17
owasp-amass/amass #445

Runtime crash on 3.8.1: concurrent map writes [FIXED, PR]

UPDATE: I took two different approaches to fixing this, please see PR #446 and #447. #446 applies a single mutex in the problematic function. #447 adds a new string set implementation that is inherent…

mzpqnxow updated 4 years ago
8
quickwit-oss/tantivy #811

How to get values from NamedFieldDocument

I am trying to write a web server that serves that serves up search request for an otherwise static website. I am using a fork of Zola that uses tantivy (version 12.0) to create a search index with al…

Th3Whit3Wolf updated 4 years ago
2
tensorflow/tensor2tensor #1765

File Not Found: gs://tensor2tensor-data/wikisum/commoncrawl_…

### Description I tried to download the **wikisum** dataset used in the paper GENERATING WIKIPEDIA BY SUMMARIZING LONG SEQUENCES and wanted to use my own computer to do it instead of GCP. I executed …

chiaminchuang updated 4 years ago
2

上一页 1...65 66 67 68 69 70 71...90 下一页

891 results for commoncrawl

891 results
for commoncrawl