Closed morazow closed 2 years ago
@morazow, I remember that @jakobbraun recently solved the dot-in-filenames issue in the VS thanks to the new Connection definitions with split bucket path components. Is the same fix applicable here?
It is also applied here #120. Above issue was reported when exporting, and only strange thing was bucket name. At this moment, I am not sure what causes this issue. But dots in the name may be reason.
The exception line S3AUtils.java#L1152, just checks that bucket name is not an empty string.
So splitting and reassembling might still help, I am going to check it.
Hey all,
I have looked into this issue. The main reason for failure, is that java.net.URI
getHost method does not work for bucket names that end in numbers.
@ParameterizedTest
@CsvSource({ //
"s3a://exa.test.aws.s3.bucket.01.etl/", //
"s3a://bucket.name.dots.007.s3.amazonaws.com/", //
"s3a://007.s3.amazonaws.com/", //
"s3a://007/", //
"s3a://007L/", //
})
void testS3BucketURIValid(final String bucketPath) throws URISyntaxException {
final URI uri = new URI(bucketPath);
assertThat(uri.getHost(), is(notNullValue()));
}
@ParameterizedTest
@CsvSource({ //
"s3a://exa.test.aws.s3.bucket.etl.01/", //
"s3a://exa.test.aws.s3.bucket.etl.01/key", //
"s3a://bucket.name.dots.007/", //
"s3a://pre.007/", //
})
void testS3BucketURIInvalid(final String bucketPath) throws URISyntaxException {
final URI uri = new URI(bucketPath);
assertThat(uri.getHost(), equalTo(null));
}
AWS SDK should also fail for all schemes other than s3
. For s3
scheme, it uses URI authority.
From S3 bucket naming rules:
So maybe it is not allowed to end a bucket name with a number.
I am going to add early check for this in the project with user friendly exception.
you could also take the chance and switch to the unified API...
That would be really good. But I am not aware any library for JVM that can unify them.
Only one is hadoop-tools
, but here even the GCS is separately provided.
Situation
Exception: