spark cannot load google cloud bucket files that contain wildcards (e.g., gs://the-peoples-speech-west-europe/archive_org/Nov_6_2020/ALL_CAPTIONED_DATA/1961DoctorBloodsCoffinWKieronMoore/[1961]Doctor Blood's Coffin w Kieron Moore.mp3). Characters like "[" and "]" trigger the problem, but others may as well.
Reproducer:
spark.read.format("binaryFile").read("gs://the-peoples-speech-west-europe/archive_org/Nov_6_2020/ALL_CAPTIONED_DATA/1961DoctorBloodsCoffinWKieronMoore/[1961]Doctor Blood's Coffin w Kieron Moore.mp3")
spark cannot load google cloud bucket files that contain wildcards (e.g.,
gs://the-peoples-speech-west-europe/archive_org/Nov_6_2020/ALL_CAPTIONED_DATA/1961DoctorBloodsCoffinWKieronMoore/[1961]Doctor Blood's Coffin w Kieron Moore.mp3
). Characters like "[" and "]" trigger the problem, but others may as well.Reproducer:
It should give an error.
Related issue (although it doesn't talk about spark itself): https://stackoverflow.com/questions/42087510/gsutil-ls-returns-error-contains-wildcard/42146769