Smerity / cc-warc-examples

CommonCrawl WARC/WET/WAT examples and processing code for Java + Hadoop
MIT License
56 stars 47 forks source link

hadoop No FileSystem for scheme: s3n #4

Open aakashkag opened 8 years ago

aakashkag commented 8 years ago

when i am trying to give input path (below path) to hadoop ,i am getting "Error: org.jets3t.service.impl.rest.httpclient.RestS3Service.(Lorg/jets3t/service/security/AWSCredentials;)V " Error

Input path: s3n://aws-publicdatasets/common-crawl/crawl-data/CC-MAIN-2013-48/segments/1386163035819/wet/CC-MAIN-20131204131715-00000-ip-10-33-133-15.ec2.internal.warc.wet.gz

ldaume commented 7 years ago

You can try:

import org.jets3t.service.impl.rest.httpclient.RestS3Service

val restS3Service = new RestS3Service(null)
val fileName = "crawl-data/CC-MAIN-2013-48/segments/1386163035819/wet/CC-MAIN-20131204131715-00000-ip-10-33-133-15.ec2.internal.warc.wet.gz"
val s3Object = restS3Service.getObject("commoncrawl", fileName, null, null, null, null, null, null)

val archiveReader: ArchiveReader = WARCReaderFactory.get(fileName, s3Object.getDataInputStream, true)