Absolute URLs do not open in Python activecode hosted on Runestone

ascholerChemeketa commented 1 year ago

The datafile logic for loading files in activecode.j does not handle absolute URLs when hosted on Runestone. I believe this is a combo of code and server setup.

Replication: Go to https://runestone.academy/ns/books/published/welcomecs/CSPTeasers/computeImages.html

Change line 5 of the active code to refer to an absolute URL. Ex:

catPic = image.Image("https://computerscience.chemeketa.edu/people/andrew-scholer/avatar_hudb8f29e9adefc011bdbb6a5afae5f1e7_284791_270x270_fill_q90_lanczos_center.jpg")

The network request ends up with: net::ERR_SSL_PROTOCOL_ERROR

Trying to directly hit the proxy with https://image.runestone.academy:8080/320x/https://computerscience.chemeketa.edu/people/andrew-scholer/avatar_hudb8f29e9adefc011bdbb6a5afae5f1e7_284791_270x270_fill_q90_lanczos_center.jpg fails with an SSL protocol error.

Hitting the proxy using HTTP does work: http://image.runestone.academy:8080/320x/https://computerscience.chemeketa.edu/people/andrew-scholer/avatar_hudb8f29e9adefc011bdbb6a5afae5f1e7_284791_270x270_fill_q90_lanczos_center.jpg BUT... when a page loaded with HTTPS tries to do that it results in an SSL error (can't drop from HTTPS to HTTP with ajax).

I think this line https://github.com/RunestoneInteractive/rs/blob/e35d82e7c790f582a1a329f30c0b382b66520b94/bases/rsptx/interactives/runestone/activecode/js/activecode.js#L1388 Needs to get changed to 'https... But that only works if whatever is serving port 8080 has a valid cert

bnmnetp commented 1 year ago

Writing our own proxy endpoint in book server and retiring that old proxy (written in Go circa 2014)

Import requests
from io import BytesIO
from PIL import Image

def get_image(url):
    response = requests.get(url)
    img = Image.open(BytesIO(response.content))
    Small = img.thumbnail((320, 240))
    Return small.tobytes()

we would have to reconfigure the activecode.js to use our own proxy.

kklamberty commented 1 year ago

I'm running into what I think is a related (maybe same) problem. I don't understand the reply above: Are you saying you are writing a new proxy or are you saying you would need to to solve the problem?

Is there a way to do the image processing parts of chapter 8 of HTTLACS? I have not found success. I have been able to use the code in IDLE with the image module, but I am hoping for a solution that does not require introductory cs students to step out of the interactive book just yet. (@bnmnetp)

bnmnetp commented 1 year ago

@kklamberty image problems have always worked fine in the browser. It could be a problem with the move to PreTeXt for this edition of the httlacs book... But I really have no idea what the problem could be based on "not having success". Specific problem descriptions and error messages are much more helpful.

bnmnetp commented 1 year ago

What we were discussing above is a specific failure that Andrew was experiencing. I have since fixed that problem.

ascholerChemeketa commented 1 year ago

I am guessing that the "Luther.jpg" image was not added to that book as a datafile in the pretext version. What book is that even pulled from?

@bnmnetp I think there are still issues with proxy. I was going to suggest as a workaround they have students use a full URL. But my attempts to use the new proxy fail:

Here is an image URL: https://upload.wikimedia.org/wikipedia/commons/thumb/7/74/A-Cat.jpg/320px-A-Cat.jpg This URL is what is produced if you try to make Image("https://upload.wikimedia.org/wikipedia/commons/thumb/7/74/A-Cat.jpg/320px-A-Cat.jpg")

https://runestone.academy/ns/rsproxy/imageproxy/https://upload.wikimedia.org/wikipedia/commons/thumb/7/74/A-Cat.jpg/320px-A-Cat.jpg That returns 200 OK, with type ContentType application/json and body null

Trying to load a bad URL like https://runestone.academy/ns/rsproxy/imageproxy/https://badurl doesn't generate a 404, it produces a 500 error.

bnmnetp commented 1 year ago

@ascholerChemeketa

This url works fine: https://runestone.academy/runestone/static/rectangle_badge.png

I wonder if there are some redirects or other issues going on that make some images work and others not. I have not tested tons of images. Definitely should return 404 and not 500 on bad urls.

ascholerChemeketa commented 1 year ago

@bnmnetp Ahhhh. Must be wikipedia refusing to serve that image to what it thinks is a bot. But yes, that error is not getting reported well to the user.

@kklamberty This is a workaround until someone updates that book: Find an image url that works when added after https://runestone.academy/ns/rsproxy/imageproxy/. e.g. I can take https://runestone.academy/ns/books/published/welcomecs2/external/CSP/Images/cat.jpg and make https://runestone.academy/ns/rsproxy/imageproxy/https://runestone.academy/ns/books/published/welcomecs2/external/CSP/Images/cat.jpg. If that URL loads in your browser, you should be good to use the image URL ("https://runestone.academy/ns/books/published/welcomecs2/external/CSP/Images/cat.jpg") as the filename in the program.

kklamberty commented 1 year ago

I am guessing that the "Luther.jpg" image was not added to that book as a datafile in the pretext version. What book is that even pulled from?

To be more specific, I am using a pretext version of HTTLACS. In my specific instance of the book, it's not working, but it's also not working in the book that anyone can access to preview: https://runestone.academy/ns/books/published/httlacs/more-about-iteration_dimensional-iteration-image-processing.html?mode=browsing

Images are not showing up in the book, and the examples are not runnable because of the images not being part of the book. I tried using complete URLs, and I have had mixed success. For example, I can get this to work:

import image
img = image.Image("https://morris.umn.edu/sites/morris.umn.edu/files/2023-07/visit-campus_600x400.jpg")

print(img.getWidth())
print(img.getHeight())

Based on this issue reported on GitHub, I thought to try https image URLs, but the same code fails with some other URLs of the https variety, and I'm not sure why. Rather than start a new issue, I thought this might be related enough to put the issue here. I'm not sure what would be the best place to put this issue, and I apologize for not being more informative with the error messages I was seeing. Since I thought the problem I was having might be related, I put my note here. I didn't want to create a new issue if it was related. Does this seem to be an unrelated issue? Are there multiple things going on? (The images don't seem to be in the pretext version of the book and maybe something is happening with the proxy?)

Here is a URL that doesn't work that is very similar to the URL above that does work: https://morris.umn.edu/sites/morris.umn.edu/files/2022-07/grey-logo.png

I don't understand why this might be happening that some files work and others don't when the URLs are so similar. But, for the book, it might just be the case that the images that are not showing up are not in the pretext version of the book.

4AB91D05-4B5C-4402-9CC0-DD5262C9DCC3

bnmnetp commented 1 year ago

Thank you @kklamberty There are two issues at play here.

The httlacs version was translated to PreTeXt from RST. I thought we had a decent review of it last summer to catch stupid mistakes like images not showing up when they should, but apparently not. So reports of things like missing images are really appreciated. With more than 50 books in the library I can't keep track of everything, and httlacs could use a maintainer.
I had a long standing, rock solid, image proxy that just worked for years. But recent browser security updates have rendered that proxy useless. So, I tried to build a quick and dirty proxy in python running on our servers. Its possible it was too quick or too dirty and therefore less reliable in the image URLs it supports. More work to do.

I have added the luther.jpg file to the chapter and the programs that use it are working for me.

Always good to hear from a fellow Minnesotan. Sorry for the problems.... I'm spread a little thin. If anyone reading this is interested in becoming a maintainer for this book, please let me know!

bnmnetp commented 1 year ago

OK, I've fixed a couple of bugs in my proxy that should make it work for lots more images. It turns out that lots of places do not set the Content-Type header which I was relying on to figure out what kind of an image and/or set the media type after conversion. Now I infer it from the filename, but that may not be perfect either. But jpg, jpeg, png, and gif should all be fine. I was also foolishly trying to just convert everything to jpeg and return that, but there are problems with that depending on the source image and the color model they use. I'm sure we will uncover additional complexities. But this should be miles better.

I will try to push out those changes to the live system tomorrow morning.

RunestoneInteractive / rs

Absolute URLs do not open in Python activecode hosted on Runestone #348