Closed dhopp97 closed 5 years ago
I should have mentioned that I was still using python 2.7 (we are working on python 3 support). This issues stems from this code:
https://github.com/crs4/pydoop/blob/0fbd2899c001922faff10cd0d10727fbf40353e0/pydoop/hdfs/fs.py#L39
So in fs.py a custom basestring is being imported and in py3compat.py if not using python3 basestring is being set to be the same as unicode and this causes 'isinstance('/test', basestring) to equate to False.
Why in the case of using python2 is basestring being forced to be the same as unicode? If I patch this to make basestring=basestring am I going to break something more deep down?
Yes, the custom compatibility layer forces things a bit in the case of basestring
. I've already pushed a patch to my fork, it's currently being tested. The main thing is that hdfs
methods actually expect text objects, but that can be surprising to the Python 2 user, so I'm patching it.
I'm glad to hear you're moving to Python 3, since we plan to remove support for Python 2 at some point (might be 2020 as the rest of the community). Anyway, you should be able to make your code work with the current Pydoop version by passing unicode objects:
pydoop.hdfs.ls(u'/test', recursive=True)
Upgrading pydoop to 2.0.0 was one of the first steps in migrating to python 3. Yeah I knew I could force the string to be unicode but didn't see that called out in the docs that things were expected to be unicode.
Before going down the route of fixing all the places we use pydoop wanted to check if this was expected and if a simple patch to pydoop to work the 'old' way inside of python2 was feasible.
Create a directory in hdfs:
hdfs dfs -mkdir /test
With pydoop 1.1.0 this works:
With pydoop 2.0.0 it fails
Setting recursive=False does indeed work.