Closed romulogoncalves closed 6 years ago
Hey @romulogoncalves, you also need to setup aws s3 sdk credentials: https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html
GeoTrellis reads data from S3
via S3 SDK
usage and not via Hadoop API
. link
In your case if you still want to use s3 but using Hadoop API try to use a common HadoopPointCloudRDD
@pomadchin thanks for the quick reply.
Now I understood how S3PointCloudRDD works.
For now I will just use HadoopPointCloudRDD to read from a local object storage with S3 Api. I just tested and it works.
Out of curiosity, is there any performance difference when using HadoopPointCloudRDD instead of S3PointCloudRDD? Why not only have HadoopPointCloudRDD to access HDFS and object storage with S3 API?
@romulogoncalves there is, S3
should work faster. Hadoop API is a bit slower in this case, though you can double check.
Thanks for the reply. I think the issue can be closed.
Hi,
I trying to read pointcloud data to Spark from a local storage which provides S3 API (in our case we use Minio. To do that, I define the following in the core-site.xml:
When I do a normal read from S3 is works, for example:
When I try to read a laz file using S3PointCloudRDD, it fails because it attempts to connect to AWD service. I use the following code:
Which defines a RDD:
Then I ask for the schema, i.e., it execute it:
Error:
It seems it is connecting to AWS serivces and not the endpoint which we defined in the core-site.xml. Do we need to set some extra configuration? Reading from HDFS is works without issues.