comet-ml / kangas

🦘 Explore multimedia datasets at scale
https://github.com/comet-ml/kangas/wiki
Apache License 2.0
1.04k stars 46 forks source link

I keep getting an application error when running from the cmdline #78

Closed datavistics closed 1 year ago

datavistics commented 1 year ago

Application error: a client-side exception has occurred (see the browser console for more information).

Error: An error occurred in the Server Components render. The specific message is omitted in production builds to avoid leaking sensitive details. A digest property is included on this error instance which may provide additional details about the nature of the error.

dsblank commented 1 year ago

@datavistics To help figure this out, you could do a couple of things:

  1. Open up the web console in your browser (on some it is control+shift+j) and grab a screen shot of the error, and post here
  2. If you can, feel free to post or email me (doug.blank@gmail.com) a small datagrid that exhibits the issue.

Also, will likely need to know what version of Kangas, and os details (the info in the Kangas about button, top left in the UI).

datavistics commented 1 year ago
image

Im using:

dsblank commented 1 year ago

Thanks for helping find the issue! Could you try running this from the command line:

kangas server --debug

And see if that works? If not, what is displayed in the console?

datavistics commented 1 year ago

    at JSON.parse (<anonymous>)
    at packageData (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/node_modules/next/dist/compiled/undici/index.js:2:5022)
    at specConsumeBody (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/node_modules/next/dist/compiled/undici/index.js:2:4734)
    at async fetchIt (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/.next/server/app/page.js:1031:26)
    at async fetchDataGrid (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/.next/server/app/page.js:1059:22)
    at async Main (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/.next/server/app/page.js:3749:18)
fetchDatagrid: server not ready
TypeError: Cannot destructure property 'columnTypes' of 'data' as it is null.
    at TableDisplay (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/.next/server/app/page.js:2352:13)
    at X (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/.next/server/chunks/310.js:364112:13)
    at La (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/.next/server/chunks/310.js:364278:21)
    at Object.toJSON (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/.next/server/chunks/310.js:364042:20)```
    at stringify (<anonymous>)
    at da (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/.next/server/chunks/310.js:363561:9)
    at Ua (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/.next/server/chunks/310.js:364374:30)
    at Qa (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/.next/server/chunks/310.js:364161:23)
    at ping (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/.next/server/chunks/310.js:364170:20)
fetch error: http://127.0.0.1:4001/datagrid/query-total?dgid=.%2Fdatagrids%2Fbeans.datagrid
SyntaxError: Unexpected token < in JSON at position 0
    at JSON.parse (<anonymous>)
    at packageData (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/node_modules/next/dist/compiled/undici/index.js:2:5022)
    at specConsumeBody (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/node_modules/next/dist/compiled/undici/index.js:2:4734)
    at async fetchIt (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/.next/server/app/page.js:1031:26)
    at async fetchDatagridTotal (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/.next/server/app/page.js:3492:22)
    at async PagerBar (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/.next/server/app/page.js:3533:24)
fetch error: http://127.0.0.1:4001/datagrid/completions?dgid=.%2Fdatagrids%2Fbeans.datagrid&timestamp=1680091782.2293537
SyntaxError: Unexpected token < in JSON at position 0
    at JSON.parse (<anonymous>)
    at packageData (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/node_modules/next/dist/compiled/undici/index.js:2:5022)
    at specConsumeBody (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/node_modules/next/dist/compiled/undici/index.js:2:4734)
    at async fetchIt (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/.next/server/app/page.js:1031:26)
    at async fetchCompletions (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/.next/server/app/page.js:2405:22)
    at async SettingsBar (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/.next/server/app/page.js:3417:25)
fetch error: http://127.0.0.1:4001/datagrid/query-page?dgid=.%2Fdatagrids%2Fbeans.datagrid&sortDesc=false&offset=0&limit=1&timestamp=1680091782.2293537
SyntaxError: Unexpected token < in JSON at position 0
    at JSON.parse (<anonymous>)
    at packageData (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/node_modules/next/dist/compiled/undici/index.js:2:5022)
    at specConsumeBody (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/node_modules/next/dist/compiled/undici/index.js:2:4734)
    at async fetchIt (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/.next/server/app/page.js:1031:26)
    at async fetchDataGrid (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/.next/server/app/page.js:1059:22)
    at async SettingsBar (/opt/homebrew/Caskroom/miniconda/base/envs/kangas-demo/lib/python3.9/site-packages/kangas/frontend/standalone/.next/server/app/page.js:3418:22)
fetchDatagrid: server not ready
dsblank commented 1 year ago

Does that happen on both of the datagrids that you have (beans and coco-500)? You can download the latest coco-500.datagrid from: https://kangas.comet.com/?datagrid=/data/coco-500.datagrid bottom, left-hand corner, in case this is an older datagrid.

dsblank commented 1 year ago

Does it happen interactively? Does this work (from python or ipython or notebook):

from datasets import load_dataset
import kangas as kg

dataset = load_dataset("beans", split="train")
dg = kg.DataGrid(dataset)
dg.show()
datavistics commented 1 year ago

Thanks so much! Im not sure how but when I saved, it was an empty file. Im trying to run this in a HuggingFace space where you can interactively download datasets and visualize them.

dsblank commented 1 year ago

Oh, glad you got that figured out! Let us know if you get some cool projects going!

datavistics commented 1 year ago

Will do, its just about ready. One last question, any idea why saving a datagrid might crash the app? (plenty of ram/cpu)

dsblank commented 1 year ago

Hmmm... no. We'd need more details to track down the problem (the error message raised). Do you have a sample script that crashes when you dg.save()?

datavistics commented 1 year ago

Sorry, I probably seem all over the place. I feel sometimes the code is running sometimes it isnt. Ive made sure to liberally apply kangas server --terminate locally. The app seems to crash often without an error. I feel its something on the kangas part as the streamlit code is quite simple.

You can see my code here: https://huggingface.co/spaces/derek-thomas/kangas-demo/tree/main The only relevant files are the datagrids and app.py

I think there is something wrong with the datagrid files. Maybe they get corrupted when they get viewed?

Regarding the app crashes: I feel its something on the kangas part as the streamlit code is quite simple.

datavistics commented 1 year ago

jupyter:

from pathlib import Path
proj_dir = Path.cwd().parent

import kangas as kg
from datasets import load_dataset

dataset_repo = 'beans'
dataset = load_dataset(dataset_repo, split="train")
dg = kg.DataGrid(dataset)
dg_file_name = dataset_repo.replace('/', '__') + '.datagrid' + '.2'
dg.save(proj_dir / 'datagrids' / dg_file_name)
kg.show()

Im having trouble with kg.show() and loading the recent datagrid. I have a folder called datagrids where Im storing the saved datagrids.

dsblank commented 1 year ago

No problem :) You are trying some things never before attempted, so we're glad to work with you to figure this out. This would be cool to get working!

I'll have the team look at this to see if we can replicate and figure out what is happening.

A hint, glancing at the setup: If you don't name a datagrid, it will get saved in /tmp. Better to: dg = kg.DataGrid(dataset, name="beans") to save it in current directory.

dsblank commented 1 year ago

I see you have saved it to a file. Checking and trying to reproduce...

dsblank commented 1 year ago

I think I see the issue... looks like SQLite doesn't like PosixPaths. Try:

dg.save(str(proj_dir / 'datagrids' / dg_file_name))

Not sure about that ".2" on the end... might cause an issue.

dsblank commented 1 year ago

@DN6 is also looking into how streamlit can access kangas via its port to see if that is possible.

dsblank commented 1 year ago

While we're looking into this, did you see the kangas + huggingface integration? You can import/export huggingface datasets, both from the command-line, and in code:

Import a HF dataset into a datagrid:

kangas import --huggingface detection-datasets/fashionpedia_4_categories fashionpedia.datagrid \
    --options split=val samples=10 labels=objects:category bbox=objects:bbox:xyxy ids=objects:bbox_id

Export a datagrid into HF dataset:

kangas export --huggingface my-org/my-new-dataset samples.datagrid --options limit=1000
datavistics commented 1 year ago

I think I see the issue... looks like SQLite doesn't like PosixPaths. Try: dg.save(str(proj_dir / 'datagrids' / dg_file_name))

That makes sense, but it didnt fix the issue. dg.show() works, but not kg.show()

While we're looking into this, did you see the kangas + huggingface integration? You can import/export huggingface datasets, both from the command-line, and in code:

Yes! Im using that in the space. I wanted to use the space to visualize any multi-modal dataset (well text/images anyway).

dsblank commented 1 year ago

Hey, this is working! https://huggingface.co/spaces/derek-thomas/kangas-demo

dsblank commented 1 year ago

Oh, I see... it is using localhost... and I happen to have kangas running :)

I guess you will need to set --backend-server and --backend-port from the command-line. I think you can set that from show() as well. Let me check.

dsblank commented 1 year ago

No, setting the backend server args cannot be done via show(). Now they can: #79 (we'll need to push out a new version, if this is indeed needed).

dsblank commented 1 year ago

This is going to be great, if we can get this to work! Here is what it looks like, when it accidentally asks localhost for the info:

Screenshot from 2023-03-29 09-48-47

Making a new Kangas version with #79 included...

dsblank commented 1 year ago

@datavistics Just wanted to drop you a short note that my Kangas colleague @caleb-kaiser (also at https://comet.com) has a prototype working! We'll continue to explore this, and hopefully can get streamlit (or some kind of framework) in place to do more interactive, dynamic explorations.

https://huggingface.co/spaces/CalebCometML/kangas-test

Screenshot from 2023-03-29 15-17-24

datavistics commented 1 year ago

Oh, I see... it is using localhost... and I happen to have kangas running :)

I guess you will need to set --backend-server and --backend-port from the command-line. I think you can set that from show() as well. Let me check.

Why does this matter? 🤔 I can do an install from the repo to include #79 .

@datavistics Just wanted to drop you a short note that my Kangas colleague @caleb-kaiser (also at https://comet.com) has a prototype working! We'll continue to explore this, and hopefully can get streamlit (or some kind of framework) in place to do more interactive, dynamic explorations.

https://huggingface.co/spaces/CalebCometML/kangas-test

Screenshot from 2023-03-29 15-17-24

Thanks, I knew that approach would work, but I wanted to add some functionality as you already have an equivalent demo page 😄

dsblank commented 1 year ago

Thanks, I knew that approach would work, but I wanted to add some functionality as you already have an equivalent demo page

True, but that only hosts what datagrids we have selected. Being able to easily set up a huggingface space with any set of datagrids is useful!

The main problem as I understand it is that with the streamlit app being accessed via https requires that the embedded iframe also be https. But there is no other port other than what streamlit is using (443). Is that correct? If there were a secondary https port, your solution would work (streamlit with an iframe).

Another solution would be to allow Kangas to populate the selection box with huggingface dataset names (rather than have streamlit select it and download). Then we'd just need to allow overriding an endpoint to fetch and create the datagrid.

datavistics commented 1 year ago

True, but that only hosts what datagrids we have selected. Being able to easily set up a huggingface space with any set of datagrids is useful!

Exactly!

The main problem as I understand it is that with the streamlit app being accessed via https requires that the embedded iframe also be https. But there is no other port other than what streamlit is using (443). Is that correct? If there were a secondary https port, your solution would work (streamlit with an iframe).

Im not so good with this type of stuff, but I have hit a couple road blocks.

  1. The backend host cant be set with 2.2.3. It uses the -bh not the --bh flag. I was able to copy the function and manually override.
  2. LIke you say above and here, there is some iframe issue. Localhost ran my computer's localhost. Im not sure how to get it to use what is in the docker container.
  3. With huggingface spaces you can expose 1 port publically. I thought that an iframe would grab whatever was internal, but I think thats a misunderstanding on my part.

Another solution would be to allow Kangas to populate the selection box with huggingface dataset names (rather than have streamlit select it and download). Then we'd just need to allow overriding an endpoint to fetch and create the datagrid.

This would be super cool!

datavistics commented 1 year ago

Its too bad our timezones dont line up a little better. On your profile I saw you are in SF. Im 11 hrs ahead in Abu Dhabi :)

dsblank commented 1 year ago

Yes, opposite sides of the world :)

You can set the frontend http vs https with kangas server --protocol https. All flags also have double dash versions (eg, --backend-port).

Perhaps there is someone on your team that can advise: how to have a iframe listen on a secondary https address (not 443)? My colleague @caleb-kaiser will be awake in about 10 hours, and can also possibly add some ideas. Good night from this side!

datavistics commented 1 year ago

Thanks so much, Ill explore a little more! Have a good one.

dsblank commented 1 year ago

Quick update: we're working on a solution that will allow streamlit (and gradio for that matter) to run side-by-side with kangas, so that both can be served from a single port (https). If this works, we should have something in a few hours.

datavistics commented 1 year ago

That is awesome!!

datavistics commented 1 year ago

@dsblank , were you able to make any progress?

dsblank commented 1 year ago

Yes, we think so! Hoping to finish this up today. We are also putting together some instructions for how to use huggingface spaces + kangas. If you are interested in that effort, drop me a line at doug@comet.com.

caleb-kaiser commented 1 year ago

Hey @datavistics !

I've been working on this for a bit now. I have a solution that works, but it requires me to run nginx in a Docker Space. Locally, everything works, but I'm running into some nginx errors on startup in the actual Space that are hard to debug from within the Docker Space.

I see in the HF Spaces docs that nginx is the recommended solution for proxying requests to multiple services (in this case, Kangas and Streamlit) via a single public port. Do you know if there are any examples of this I could reference, to get a sense of where I might be going wrong?

Thank you for all your help with this. I'm optimistic that we'll have this working very soon!

datavistics commented 1 year ago

@caleb-kaiser I did a cursory search and found this: https://huggingface.co/search/full-text?q=nginx.conf

Full Text search can find quite a bit. If this doesnt help let me know.

datavistics commented 1 year ago

Im guessing a bit, but what about this one?

https://huggingface.co/spaces/yuchen168/vue-app/blob/main/nginx.conf

or another from the same user: https://huggingface.co/spaces/yuchen1573/uwsgi/blob/main/nginx.conf

caleb-kaiser commented 1 year ago

@datavistics this is fantastic! Thanks so much. I'll get something out to share with you shortly.

caleb-kaiser commented 1 year ago

@datavistics I have a working version of your repo hosted here: https://huggingface.co/spaces/CalebCometML/kangas-demo

Thanks for all your help with this. The key to making it work was to start both Kangas and Streamlit on separate internal ports, and then direct traffic from the single public port to the correct service using nginx. It was my first time building much with Docker Spaces--what a powerful tool!

Would it be alright with you if we shared your demo with our community?

dsblank commented 1 year ago

Documentation in progress: https://github.com/comet-ml/kangas/wiki/HostingKangasOnHuggingFaceSpaces

Feel free to make suggestions or PRs on that!

datavistics commented 1 year ago

@caleb-kaiser

Would it be alright with you if we shared your demo with our community? Absolutely! Feel free to tag me if youd like: https://huggingface.co/derek-thomas or https://github.com/datavistics

I will share this with our datasets team as I think it would be a great template or companion space to visualize datasets.

caleb-kaiser commented 1 year ago

@caleb-kaiser

Would it be alright with you if we shared your demo with our community? Absolutely! Feel free to tag me if youd like: https://huggingface.co/derek-thomas or https://github.com/datavistics

I will share this with our datasets team as I think it would be a great template or companion space to visualize datasets.

@datavistics Awesome! Let me know if I can be of any assistance (happy to work on another demo, if the datasets team has a particular idea they'd like to see)

dsblank commented 1 year ago

Here is another version, with some streamlit additions: caches the datagrid, and loads the iframe with the downloaded datagrid: https://huggingface.co/spaces/comet-team/kangas-demo

datavistics commented 1 year ago

Thats great! Thanks so much. Im checking internally if we can get this in a docker space template. I think what would be really valuable dataset creators used this in a companion space as an explorer. We already have a few templates for some MLOps and Data Labeling. Whats cool is this would have immediate value to 25k+ datasets, so the usage and impact would be high.

Yesterday was a holiday for many people, so hopefully Ill hear back today 🤞

datavistics commented 1 year ago

Hey, quick problem. I tried using cats_vs_dogs and it crashed. I think the nginx might have a timeout but not sure.

datavistics commented 1 year ago

Im checking internally if we can get this in a docker space template.

Since its a newer library I think we want to revisit this later as it matures. Not related to the bug, but the others we did before are heavily used. Im on a different team, I just thought you have a cool lib and want to see how I can help 😎

That said I think there is some value that could be added with the HF Hub and Kangas in other ways:

  1. Using the space as is, you could leave it to the user to choose what dataset to visualize.
  2. The HF Hub allows you to duplicate spaces. You could make a space designed to be duplicated, and instead of keeping functionality that lets you visualize an arbitrary dataset, it could be focused on a specific dataset given by the space creator. This would keep the same functionality as a space template, but with a little extra work. Usually there are instructions on what to do after you duplicate like here Example. There are a couple other examples of spaces from libraries/organizations here:

Let me know if that wasnt worded well, or if you have some ideas, Id be happy to support as best as I can!

caleb-kaiser commented 1 year ago

@datavistics Thanks for flagging the NGINX error. I'm looking into it currently.

Also, we understand completely re: waiting for the project to mature. It's still early days, and interactions like these are incredibly helpful for us in developing Kangas–so thank you for sharing all of this feedback/going back and forth with us. Can't say enough how much we appreciate it :)

We've actually been discussing some ideas similar to your suggestions! If you don't mind, I'll shoot you a ping when we release them.

datavistics commented 1 year ago

Thats awesome! Feel free to hit me up via email as well. Ill email doug at the one he listed above so you will have my contact info.