Renumics / spotlight

Interactively explore unstructured datasets from your dataframe.
https://renumics.com
MIT License
1.04k stars 83 forks source link

Can't access from WSL2 #444

Closed tylertitsworth closed 3 months ago

tylertitsworth commented 4 months ago

Describe the bug I'm trying to use Spotlight from WSL2 + VSCode on the Windows 11 system. When I run my application to launch spotlight, my terminal becomes a blank canvas with the text <<↑ ↓ Viewing <Spotlight> in the bottom left-hand corner. I try to browse to localhost:5000 in my browser there is nothing, checking the browser console shows a 404 error when trying to get index.html.

When I try to utilize my terminal or terminate the application, I find that I no longer have control of my terminal and I must close it using VSCode.

To Reproduce Steps to reproduce the behavior:

  1. Remote into WSL2, Ubuntu22.04
  2. Install renumics-spotlight
  3. Port-forward 5000 in VSCode
  4. run with the following code snippet:
    ...
    from renumics import spotlight
    spotlight.show(df, host="localhost", port=5000)
  5. Run the python script and attempt to browse localhost:5000
  6. Attempt to terminate the application with ctrl+c

Expected behavior The Spotlight UI should be accessible. My terminal should allow me to ctrl+c to terminate the application.

Screenshots If possible, include screenshots or screen recordings to better illustrate the issue.

image image

Desktop (please complete the following information):

Additional context I am trying to follow this article: https://itnext.io/visualize-your-rag-data-eda-for-retrieval-augmented-generation-0701ee98768f

neindochoh commented 4 months ago

Thanks a lot for your bug report! We are looking into it.

neindochoh commented 4 months ago

We tried to reproduce this under WSL2. (Sadly) it didn't fail on our Windows machine.

Did you remote into WSL from another or the same machine? Could you try to run it again with a simple Dataframe and verbose output (set env var SPOTLIGHT_VERBOSE=true)?

import pandas as pd
from renumics import spotlight
df = pd.DataFrame({"foo": range(4)})
spotlight.show(df)

As a workaround, you might be able to run everything directly under Windows instead of WSL.

tylertitsworth commented 4 months ago

I remoted into WSL from the same machine (Windows host). I tried the code snippet you posted, and got the same results. I clicked somewhere in the empty space shown in my screenshot and it printed what looks like a utf-8 character to my terminal, after clicking around some more it seems like I'm able to interact with the UI elements even though I can't see them. I ended up getting a menu I can't close out of: image When I browse to the address, I'm able to access the UI for the spotlight application!

Thing is, when I set the port, I can't replicate this experience, but I can still access the UI from the 37341 port, despite setting it to 5000.

neindochoh commented 4 months ago

I have an idea about the output in your terminal:

This is not output from Spotlight, but the menu looks like w3m (a terminal browser). By default, Spotlight tries to open a browser window on the host. You should be able to disable this when starting the viewer spotlight.show(..., no_browser=True).

Maybe this helps you see any hidden error messages about port 5000. At least I hope so.

tylertitsworth commented 3 months ago

Hi, that flag did the trick. I didn't you know your program supports these flags and I found the docs didn't give me an easy path to find server configuration.

I also found that spotlight used a bit more memory than I have available in addition to the dataframe I'm trying to load into memory, so I have since stopped working with this package.

Thanks for your help.

neindochoh commented 3 months ago

At least, we got this figured out. Thank you for your time and the additional feedback.

I understand that spotlight might not be feasible for your data and setup. We know about the additional memory consumption for in-memory data sources (like Pandas Dataframes) and are looking for ways to reduce it, but this might take us some time.

As an additional data point, could you ping me with your dataset size and available memory size? If you find the time, this would help us a lot.

Thanks again!