breadboard-ai / breadboard

A library for prototyping generative AI applications.
Apache License 2.0
151 stars 21 forks source link

Fetch node doesn't respect "raw" checkbox #2411

Open dcblack-google opened 2 months ago

dcblack-google commented 2 months ago

The fetch node doesn't seem to respect the "raw" checkbox and always tries to parse what it gets as JSON. I'm trying to use it to just grab a vanilla webpage's html and it's spitting out "Internal Exception: SyntaxError: Unexpected token '<', " <hea"... is not valid JSON }" even with raw checked.

paullewis commented 1 month ago

I'm not all that clear about this one; I think @dglazkov will have a view on it, though.

dglazkov commented 1 month ago

This should totally "just work", as of three weeks ago or so. I wonder if googleplex visual editor is running an older version?

dcblack-google commented 1 month ago

Now that I try again from scratch I'm getting an internal exception, both on googleplex and on the external site: Internal Exception: TypeError: Failed to fetch at .invoke (https://breadboard-ai.googleplex.com/index-DO-MixRG.js:1567:21) at .invoke (https://breadboard-ai.googleplex.com/define-ciGEp0Q7.js:308:24) at https://breadboard-ai.googleplex.com/path-registry-C9U4iGRp.js:15:5 at new Promise () at yt (https://breadboard-ai.googleplex.com/path-registry-C9U4iGRp.js:14:10) at gt. (https://breadboard-ai.googleplex.com/stream-DWT2qJoc.js:1277:15) }

very simple example board: testboard.json

dglazkov commented 1 month ago

Yup, can reproduce. The error is hidden away in DevTools:

index-BQSQ4KEq.js:1567 Mixed Content: The page at 'https://breadboard-ai.web.app/?board=idb%3A%2F%2Fdefault%2Fblank-board-2.bgl.json' was loaded over HTTPS, but requested an insecure resource 'http://motherfuckingwebsite.com/'. This request has been blocked; the content must be served over HTTPS.

If I change the URL protocol to https, I get another error:

Access to fetch at 'https://motherfuckingwebsite.com/' from origin 'https://breadboard-ai.web.app' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.

Good bug! We need to add a bit more graceful error handling to fetch, so that the user doesn't have to dig into DevTools to find the actual error.

dcblack-google commented 1 month ago

... it's respecting CORS? Gosh. Well, good to know - guess I need to come up with a completely different plan for how to fetch web content.

dglazkov commented 1 month ago

Check out this one as a possible strategy: https://breadboard-ai.web.app/?board=https%3A%2F%2Fbreadboard.live%2Fboards%2F%40dimitri%2Ftool-page-as-markdown.bgl.json

dcblack-google commented 1 month ago

That was the very first thing I tried, and then swiftly abandoned when it immediately started spitting out quota exceeded errors. (I was only calling it a couple times so it wasn't me.)

Currently have an internal solution half-built, hope to get it up and running next week, but getting arbitrary web content feels like a base capability that should be better supported by the platform.