-
Hi, do you have the full 1 million list from Microsoft?
-
i got 6 csv and want to combine them based on same col domain
```
import modin.pandas as pd
import os
import time
import ray
# Look at the Ray documentation with respect to the Ray configur…
-
Hi
I am working on Face Recognition project(facenet) using Ms-Celeb dataset, which is having 1580 classes. Now I want to increase the size of dataset to 5000 classes(Top-5000 celebrities) without 1…
-
Hi,
I think you need more validation, the Alexa top1m list is not sufficient. You probably want to validate against the public suffix list too: https://publicsuffix.org/list/public_suffix_list.dat
…
-
Add `--maxAudioSize` and `--maxVideoSize` option to ignore large media files, but scrape the smaller ones.
## Background
I analyzed the recent [wikipedia_en_top1m](https://farm.openzim.org/pipelin…
-
The input.txt file (Combined list of hosts to check) is not committed in the repo, and makes it hard to reproduce this work for other providers.
-
I ran domain-stats-settings and configured the established setting to be 30 days, however I am still seeing domains coming across that are greater than 30 days old showing as NEW still.
This is on…
-
See https://github.com/scrapy/scurl/issues/58#issuecomment-513520254 and https://github.com/scrapy/scurl/issues/58#issuecomment-513583355
Also repeating here
```
Traceback (most recent call last…
-
Being a list of features/requirements from different groups within Protocol Labs required to support the Package Managers usecase.
## What is the Package Managers use case?
Our goal is to encour…
-
I was following the install instructions from the README (macOS 10.14.5).
There was one warning about
```
s3fs 0.2.1 has requirement six>=1.12.0, but you'll have six 1.11.0 which is incompatibl…