-
Hi,
I try to implement this:
```bash
aws --no-sign-request s3 ls s3://commoncrawl/crawl-data/CC-MAIN-2021
```
It is a public AWS data-set, so no authentication. I try to work around that issue…
-
### Data Owner Name
Mongo2Stor
### What is your role related to the dataset
Data Preparer
### Data Owner Country/Region
United States
### Data Owner Industry
Not-for-Profit
### Website
[https://da…
-
When I try to print a list of files that have bucket s3, console says to me "botocore.exceptions.NoCredentialsError: Unable to locate credentials"
i write my code:
import boto3
def main():
…
-
Is there a method that supports directly open a file URL like `smart-open`?
https://pypi.org/project/smart-open/
```
open('s3://commoncrawl/robots.txt')
```
-
If a field requested by the `fl` parameter is missing in one of the records, the query processing exits with an exception and the result list is truncated:
```
Traceback (most recent call last):
…
-
Great work for fixing this mate in 5.3.0
Importing EN now, do you know of other feeds people use with it?
Have you ever thought about doing something like this with the CommonCrawl?
-
https://commoncrawl.org/
> We build and maintain an open repository of web crawl data that can be accessed and analyzed by anyone.
I'm not sure how much data it is, but certainly a few TB.
ghost updated
6 years ago
-
With the proliferation of models and model variants it becomes more important to track assessment dates and model versions.
So far we've been able to treat model families as one, because it rarely …
-
EDIT: this helped, the doc may need to be updated:
```
sc.hadoopConfiguration.set("fs.defaultFS", "s3a://commoncrawl/")
```
**Describe the bug**
According to the docs, `aut` should be able to r…
-
```
[MYUSER@MYHOST ~]$ stat .s3cfg
File: `.s3cfg'
Size: 1889 Blocks: 8 IO Block: 4096 regular file
Device: fd02h/64770d Inode: 524485 Links: 1
Access: (0644/-rw…