IBM / jupyterlab-s3-browser

A JupyterLab extension for browsing S3-compatible object storage
Apache License 2.0
119 stars 43 forks source link

Weird character display for folders in bucket + files #28

Closed octavd closed 2 years ago

octavd commented 3 years ago

Describe the bug In version 0.6.2 of the jupyterlab-s3-browser extension + 0.6.1 pip jupyterlab-s3-browser, when inside the bucket, i see weird characters displayed for the folders, files etc.

In version 0.4.1 of the jupyterlab-s3-browser extension + 0.4.1 pip jupyterlab-s3-browser, everything is displayed correctly.

To Reproduce Steps to reproduce the behavior:

  1. Connect to S3 storage
  2. Go to a bucket
  3. See folders + files

Expected behavior We shouldn't have any weird characters displayed.

Screenshots First screenshot is for the version 0.6.2: working_nok

Second screenshot is for version 0.4.1: working_ok

Desktop (please complete the following information): jupyterlab-s3-browser extension nok version: 0.6.2 jupyterlab-s3-browser pip package nok version: 0.6.1

jupyterlab-s3-browser extension ok version: 0.4.1 jupyterlab-s3-browser pip ok version: 0.4.1

octavd commented 3 years ago

Hello,

Any news regarding this?

octavd commented 3 years ago

Hello,

Happy new year! 👍 Is this repo abandoned? or.. :)

octavd commented 3 years ago

Hello @reevejd , is there any possibility you will take a look at this bug?

reevejd commented 3 years ago

Yep, I can take a look. Can you start by confirming the issue is present on the most recent version (0.8.0)?

If the issue is still present, can you help me reproduce the issue? How do I need to set up my bucket (e.g. what objects should I create) to get that behaviour?

octavd commented 3 years ago

Hello @reevejd ,

First of all thank you for your message.

Secondly, I tried to install version 0.8.0 but it seems it's automatically grabbing version >= 3.0.0 of JupyterLab. I am working with jupyterlab==2.2.9.

Ways to reproduce? Well, connect to an S3 endopint -> create a bucket and some folders in that bucket + files in that folder (name of them with space or "_" between words etc) - just like what i've attached in the screenshot (with boto3 etc) and you can see it has weird characters.

I've tested with version 0.7.0 and it's still the same. It's interesting that jupyterlab-s3-browser extension ok version: 0.4.1 jupyterlab-s3-browser pip ok version: 0.4.1 works like a charm - see screenshots from above

It seems something changed from version 0.4.1 thats displaying the characters that way.

reevejd commented 3 years ago

I've released version 0.9.0, which should be backwards compatible with JupyterLab 2. Hopefully this problem isn't present on that version, and you can ignore everything below.


If you're still experiencing the issue, can you give me an example object key that you have in your bucket that's giving you problems?

E.g. I have a bucket (named test) which contains a single object with a key test_prefix/test.md:

Screen Shot 2021-03-08 at 9 30 06 PM

As you can see, it renders as expected. In your original screenshots the objects keys aren't displayed normally, which means I can't tell what the actual key is, so it's hard for me to try to reproduce.

octavd commented 3 years ago

Hmm, i've installed the new extension but when clicking on it it just .. white. See attached screenshot.

image

Also attached you can find the errors from chrome console.

image

reevejd commented 3 years ago

It looks like the labextension is installed, but not the serverextension. Can you confirm you've run:

pip install pip install jupyterlab-s3-browser==0.9.0
jupyter serverextension enable --py jupyterlab_s3_browser

I'll create an issue for displaying an error message when the serverextension is not installed

octavd commented 3 years ago

Yes, I confirm that i've ran that command also. The same situation.

reevejd commented 3 years ago

Thanks for confirming. Can you please also show me the output of

jupyter serverextension list

The fact that the labextension is getting a 404 trying to reach the serverextension suggests something either something went wrong with the serverextension installation, or the serverextension is erroring out on startup. Are there any error messages in the terminal logs after starting JupyterLab?

octavd commented 3 years ago

Hey, So i have my jupyterlab in a docker container and rebuilt again the image. This time it worked. It displays the login page.. BUT there is something very fishy that is happening:

Below you can find the screenshot with Jupyterlab that has s3 extension version 0.9.0. In the left you can see i am listening the contents of a bucket with aws cli. It has a folder + some files in that folder. 0 9 0_updated But in the extension we have those files that don't appear.

Now, i've reinstalled version 0.4.1 of the s3 extension and as you can see the info is displayed correctly with both the aws cli command + s3 extension. 0 4 1

I am using the same aws + secret key for all operations.

reevejd commented 3 years ago

If you go to /jupyterlab_s3_browser/files/<your_bucket_name>/enec7464/ (note: trailing slash is important) do you get normal looking file/object names or are they messed up as well? E.g. I'm typically running on http://localhost:8888/lab so I would go to http://localhost:8888/jupyterlab_s3_browser/files/<your_bucket_name>/enec7464/. I'm not familiar with your setup so adjust accordingly.

If the filenames look normal, it's a rendering issue and we should look for errors or issues in the browser console logs. If they're also messed up, then it's an issue with the serverextension and we should look at the the container logs for any strange messages or errors the serverextension is throwing.

octavd commented 3 years ago

Checked with version 0.9.0 and it seems it's broken there also:

[{ "name": "2F", "path": "bucket/enec7464%2F", "type": "file", "mimetype": "json" }, { "name": "2Ffabien%2F", "path": "bucket/enec7464%2Ffabien%2F", "type": "directory", "mimetype": "json" }]

I've checked the container logs and there is no error, warning etc.

In 0.4.1 information is displayed correctly.

octavd commented 3 years ago

From what i can see in the changes made for this project i see that in version 0.5.0 listing of the objects was changed -> https://github.com/IBM/jupyterlab-s3-browser/compare/0.4.1...0.5.0

reevejd commented 3 years ago

Thanks for the details, I'll take a closer look later today

reevejd commented 3 years ago

Unfortunately I still haven't been able to reproduce the issue:

Screen Shot 2021-03-14 at 11 01 19 AM

Are you experiencing the issue for all of the objects in the bucket, or just the ones with the enec7464 prefix? Are you experiencing the issue for all buckets or just one? Are you able to share the docker image you're using?

octavd commented 3 years ago

I am experiencing this for all the prefixes from the bucket. It's very strange because if i put that version 0.4.1 it works. Everything is displayed correctly. The rest of the docker image libraries remain the same. I cannot share that docker image.

LE: I've tried to view the contents of another bucket and the prefixes are the same - with %2F aka / at the end of them.

octavd commented 3 years ago

Hello @reevejd ,

Hope you are well. I was wondering if you have tested this extension with an S3 Compatible Storage?

reevejd commented 3 years ago

Yes, I actually use IBM Cloud's Object storage and Minio more than AWS's S3 (though I do use that as well).

octavd commented 3 years ago

Hey @reevejd ,

Ok, so i digged up more regarding this and i've found something interesting :).

I see that we have this function s3client.list_objects_v2

I've tried to run this locally and i've found that:

1) If i would call this function as it is:

It's adding a weird character -> {'Prefix': 'backup_db%2F'} and the delimiter looks like this -> 'Delimiter': '%2F'->

See below log: db-6447d9dd-d2hf2.dev_all_2.sql.gz your_file {'ResponseMetadata': {'RequestId': 'ddddddd', 'HostId': 'ddddddd', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'nginx', 'date': 'Mon, 05 Jul 2021 15:26:35 GMT', 'content-type': 'application/xml', 'transfer-encoding': 'chunked', 'x-amz-id-2': 'ddddddd', 'x-amz-request-id': 'ddddddd'}, 'RetryAttempts': 0}, 'IsTruncated': False, 'Contents': [{'Key': 'db-6447d9dd-d2hf2.dev_all_2.sql.gz', 'LastModified': datetime.datetime(2021, 4, 2, 15, 0, 22, 73000, tzinfo=tzlocal()), 'ETag': '"ddddddd"', 'Size': 993965, 'StorageClass': 'STANDARD'}, {'Key': 'your_file', 'LastModified': datetime.datetime(2021, 4, 1, 11, 51, 1, 99000, tzinfo=tzlocal()), 'ETag': '"ddddddd"', 'Size': 0, 'StorageClass': 'STANDARD'}], 'Name': 'di-diod-jupyterhub-xcloud-selfcare-dev-backup', 'Prefix': '', 'Delimiter': '%2F', 'MaxKeys': 1000, 'CommonPrefixes': [{'Prefix': 'backup_db%2F'}, {'Prefix': 'test_folder%2F'}], 'EncodingType': 'url', 'KeyCount': 2}

2) If i am executing this without that parameter -> EncodingType="url" i am getting a good response aka the folder names are displayed correctly -> 'Prefix': 'test_folder/' and Delimiter -> 'Delimiter': '/'. See below log:

db-6447d9dd-d2hf2.dev_all_2.sql.gz your_file {'ResponseMetadata': {'RequestId': 'ddddddd', 'HostId': 'ddddddd', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'nginx', 'date': 'Mon, 05 Jul 2021 15:24:18 GMT', 'content-type': 'application/xml', 'transfer-encoding': 'chunked', 'x-amz-id-2': 'ddddddd', 'x-amz-request-id': 'ddddddd'}, 'RetryAttempts': 0}, 'IsTruncated': False, 'Contents': [{'Key': 'db-6447d9dd-d2hf2.dev_all_2.sql.gz', 'LastModified': datetime.datetime(2021, 4, 2, 15, 0, 22, 73000, tzinfo=tzlocal()), 'ETag': '"ddddddd"', 'Size': 993965, 'StorageClass': 'STANDARD'}, {'Key': 'your_file', 'LastModified': datetime.datetime(2021, 4, 1, 11, 51, 1, 99000, tzinfo=tzlocal()), 'ETag': '"ddddddd"', 'Size': 0, 'StorageClass': 'STANDARD'}], 'Name': 'my_bucket', 'Prefix': '', 'Delimiter': '/', 'MaxKeys': 1000, 'CommonPrefixes': [{'Prefix': 'backup_db/'}, {'Prefix': 'test_folder/'}], 'EncodingType': 'url', 'KeyCount': 2}

And this function would never execute because our path ends with "%2F". I think we need to decode the result, like:

from urllib.parse import unquote url = unquote(url)

What do you think? Could you please check?

Best regards, Octavian

reevejd commented 3 years ago

Hi Octavian,

Thanks for the additional details. I should have some time to continue investigating later this week. I'll keep you posted.

octavd commented 3 years ago

Hey James,

Really thanks! Hope to finally solve this issue. 🚀

Best regards, Octavian

octavd commented 3 years ago

Hey @reevejd ,

Did you manage to take a look at this?

reevejd commented 3 years ago

Hi @octavd, sorry for the lack of updates. I'm working on redoing the serverextension using s3fs. One of the changes involves not relying on the leading / for distinguishing directories/prefixes from files/objects. I'm pretty sure this change will fix your issue. I'll publish a dev build as soon as I have something for you to test.

octavd commented 3 years ago

Hey @reevejd,

Looking forward to test it. :D

reevejd commented 3 years ago

@octavd dev version 0.11.0.dev2 is ready for you to try (e.g. pip install jupyterlab-s3-browser==0.11.0.dev2)

octavd commented 3 years ago

Hello @reevejd ,

I've tested and when running the

jupyter serverextension enable --py jupyterlab_s3_browser

after the jupyter labextension install jupyterlab-s3-browser and build i am getting the below error.

Traceback (most recent call last):
  File "/opt/app-root/bin/jupyter-serverextension", line 8, in <module>
    sys.exit(main())
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_core/application.py", line 254, in launch_instance
    return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
  File "/opt/app-root/lib/python3.6/site-packages/traitlets/config/application.py", line 664, in launch_instance
    app.start()
  File "/opt/app-root/lib/python3.6/site-packages/notebook/serverextensions.py", line 294, in start
    super(ServerExtensionApp, self).start()
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_core/application.py", line 243, in start
    self.subapp.start()
  File "/opt/app-root/lib/python3.6/site-packages/notebook/serverextensions.py", line 211, in start
    self.toggle_server_extension_python(arg)
  File "/opt/app-root/lib/python3.6/site-packages/notebook/serverextensions.py", line 200, in toggle_server_extension_python
    m, server_exts = _get_server_extension_metadata(package)
  File "/opt/app-root/lib/python3.6/site-packages/notebook/serverextensions.py", line 328, in _get_server_extension_metadata
    m = import_item(module)
  File "/opt/app-root/lib/python3.6/site-packages/traitlets/utils/importstring.py", line 42, in import_item
    return __import__(parts[0])
  File "/opt/app-root/lib/python3.6/site-packages/jupyterlab_s3_browser/__init__.py", line 3, in <module>
    from .handlers import setup_handlers
  File "/opt/app-root/lib/python3.6/site-packages/jupyterlab_s3_browser/handlers.py", line 15, in <module>
    import s3fs
  File "/opt/app-root/lib/python3.6/site-packages/s3fs/__init__.py", line 1, in <module>
    from .core import S3FileSystem, S3File
  File "/opt/app-root/lib/python3.6/site-packages/s3fs/core.py", line 16, in <module>
    import aiobotocore.session
  File "/opt/app-root/lib/python3.6/site-packages/aiobotocore/session.py", line 6, in <module>
    from .client import AioClientCreator, AioBaseClient
  File "/opt/app-root/lib/python3.6/site-packages/aiobotocore/client.py", line 11, in <module>
    from .args import AioClientArgsCreator
  File "/opt/app-root/lib/python3.6/site-packages/aiobotocore/args.py", line 8, in <module>
    from .endpoint import AioEndpointCreator
  File "/opt/app-root/lib/python3.6/site-packages/aiobotocore/endpoint.py", line 12, in <module>
    from aiobotocore.httpsession import AIOHTTPSession
  File "/opt/app-root/lib/python3.6/site-packages/aiobotocore/httpsession.py", line 12, in <module>
    from botocore.httpsession import ProxyConfiguration, create_urllib3_context, \
ImportError: cannot import name 'InvalidProxiesConfigError'
error building image: error building stage: failed to execute command: waiting for process to exit: exit status 1
reevejd commented 2 years ago

I'm not able to reproduce with a minimal Dockerfile, the below image builds for me:

FROM python:3.6

RUN pip install jupyterlab~=3.0
RUN pip install jupyterlab-s3-browser==0.11.0.dev2
RUN jupyter serverextension enable --py jupyterlab_s3_browser

I'm guessing I have not been specific enough in the versions of boto3 (and maybe s3fs) that my extension supports.

Would you be able to show me the output of pip freeze from your environment? That may help me reproduce the issue.

octavd commented 2 years ago

Hey, i think i've found out why.

I am using jupyterlab==2.2.9

FROM python:3.6

RUN pip install jupyterlab==2.2.9
RUN pip install jupyterlab-s3-browser==0.11.0.dev2
#RUN jupyter labextension install jupyterlab-s3-browser
RUN jupyter serverextension enable --py jupyterlab_s3_browser

EXPOSE 8888

ENTRYPOINT ["jupyter", "lab","--ip=0.0.0.0","--allow-root"]

Try like that and it should show that the extension is not installed.

image

reevejd commented 2 years ago

Yep, you're right. JupyterLab 3.0 allows installation from just the pypi package, but for 2.0 you need the npm package which I didn't publish. This is working for me:

FROM python:3.6

RUN pip install jupyterlab==2.2.9

RUN pip install jupyterlab-s3-browser==0.11.0.dev3

RUN curl -sL https://deb.nodesource.com/setup_12.x  | bash -
RUN apt-get -y install nodejs
RUN jupyter labextension install jupyterlab-s3-browser@v0.11.0-dev.3

RUN jupyter serverextension enable --py jupyterlab_s3_browser
octavd commented 2 years ago

Hey @reevejd ,

I had the same problem but found out that i was missing botocore python package. After installed it everything was ok.

The new extensions solves the issue stated above. ### The prefixes are properly displayed. Saw that we have now the possibility to upload files also, and it's very nice!! But, but, there is a small problem with the new feature.

Problem: I am in my prefix -> press the upload file -> select a file -> Upload it -> No error received, checking chrome i can see that on the PUT i have 200 response, checking the container logs -> no errors but the new file is not displayed. Even if i click on the refresh button from the extension or refresh button from the browser, it still doesn't display the new file, but if i shutdown the container -> relog -> go to my prefix -> the file is present.

octavd commented 2 years ago

Hello @reevejd ,

Any news regarding the bug described above?

Thank you!

reevejd commented 2 years ago

Hello,

I'm glad to hear the display issues have been fixed. I've noticed the issue you described in your above comment as well. I haven't figured it out yet, I'm guessing I've configured s3fs wrong somehow and the caching is broken. I'll keep you posted.

Thanks

octavd commented 2 years ago

Cool! Looking forward! 🙂

octavd commented 2 years ago

Hello @reevejd! Hope you are ok. Any news regarding this last bug? 🙂

reevejd commented 2 years ago

Hi @octavd . I think the latest build should fix most of those issues, you can try it out with:

jupyter labextension install jupyterlab-s3-browser@0.11.0-rc.0 && pip install jupyterlab_s3_browser==0.11.0-rc.0


I will try to do a proper 0.11.0 release at the end of this week after ironing out a few more bugs.

octavd commented 2 years ago

Hey @reevejd, i can finally confirm this extension works like a charm now. :) Everything described by me above is solved.

octavd commented 2 years ago

Hey @reevejd , Happy New Year! Will we have a stable version for this extension?

reevejd commented 2 years ago

Hi again, happy new year and thanks for your continued interest.

I will try to get a stable release out within the next two weeks.

octavd commented 2 years ago

Hello @reevejd ,

I've tested 0.11.0 version which was launched a few days ago and i can confirm it works ok. This loooooooooong bug can be closed now.

Thank you very much, Octavian