-
Hello all,
while crawling, we ran into a politeness issue and we suppose that its cause is that there was apparently a Connection Timeout when trying to fetch the `robots.txt`. We suppose that as a c…
-
#### Which domain(s) should be blocked?
d3vk40ihlliju7.cloudfront.net
#### Why should the domain(s) be blocked?
Subdomain requested when using mxtoolbox.com.
Blocking the subdomain seems…
-
## EMPRESA
Com 30 anos no mercado, somos uma multinacional brasileira entre as maiores integradoras de TI do Brasil.
Valorização dos talentos, seriedade, compromisso no atendimento aos clientes, agi…
-
1. Have a conflict, click on the file in the SCM view. merge editor is opened
2. Click on the "Open File" command in the title area -> file with inline diffs gets opened
3. Close all editors
4. Ope…
-
Would it be possible to create a function to mine web-links?
One would have to enter a web-address on which page there exists a list of links. These links can be "mined" and fed to the clustering-e…
-
Hey there, this is a fantastic codebase! I just have a quick question about the -o option. It could be more of a question about common crawl itself. Here it is:
Is the content of common crawl files…
-
This isn't something i think is crucial, but since i've been on the lucene-users
mailing list (less then 2 months) I've seen several people post questions asking
where they can find documentation on s…
-
When you set s SortField to a Text field which gets tokenized
FieldCacheImpl uses the term to do the sort, but then sorting is off
especially with more then one word in the field. I think it is much …
-
Vulnerable Library - nutch-1.13.jar
Library home page: http://nutch.apache.org
Path to dependency file: /pom.xml
Path to vulnerable library: /pository/org/apache/nutch/nutch/1.13/nutch-1.13.jar
…
-
Filtered query ignores it's own boost.
---
Migrated from [LUCENE-698](https://issues.apache.org/jira/browse/LUCENE-698) by Yonik Seeley (@yonik), resolved May 30 2007
Attachments: [lucene-698.patch…