daijro / CourseHeroUnblur

(⚠️DISCONTINUED⚠️) PoC Page Stitcher Image Manipulation Tool
Apache License 2.0
38 stars 14 forks source link

Support #1

Open ghost opened 2 years ago

ghost commented 2 years ago

Thanks for the tool. ❤❤

Can you add support to download files from the https://www.coursehero.com/tutors-problems/* endpoint. eg. the tool can download https://www.coursehero.com/file/61519475/Human-Services-Assignment-1-docx/ but not https://www.coursehero.com/tutors-problems/Social-Psychology/38685580-What-is-a-civic-professional-in-relation-to-the-Human-Service/

daijro commented 2 years ago

Hello, thank you for bringing attention to this! Sadly, I don't believe it is possible to scrape information from these endpoints.

On https://www.coursehero.com/file/* links, the pages are hosted as blurred images that are split up into unblurred previews. This tool gathers unblurred parts of each page hosted on CourseHero servers and rebuilds the document behind the paywall.

In tutor-problems endpoints, the previews shown on CourseHero are actually randomly generated blurred text:

image

From what I could tell, there wasn't any way for me to gather any previews or split segments of the original answer. The only way to access the information behind the paywall would have to be using a CourseHero premium account token :(

Thank you so much for using my tool!!! I'm glad to see people using it. Currently working on a major update!!!

ghost commented 2 years ago

Thank you for your update. We shall wait for the update.

ghost commented 2 years ago

Screenshot 2022-03-26 194123

Using the same IP i was able to download one homework solution and the other raised an error. what could be the issue. I even tried multiple NordVPN IPs but the same kept happening

daijro commented 2 years ago

Just released a fix, hopefully it works now

ghost commented 2 years ago

still giving the same error..another thing to mention is that i am using python 3.10.4 on ubuntu

daijro commented 2 years ago

Hello, sorry for the late response. There were a few questions I'd like to ask:

  1. Are you able to reach this endpoint in a browser? If you are, I'll need to fix my request headers.

  2. Does it only fail on this specific CourseHero link (https://www.coursehero.com/file/p20cefc/D-Re-direct-behavior-by-providing-choices-or-options-for-alternative-activities/) or all of them? Do any other CourseHero links fail?

  3. I was able to successfully run this using Python 3.8.9 on Windows shown below (I wasn't able to reproduce your error). Perhaps the version of Python you are running could be the issue? This script wasn't built with Python 3.10 compatibility in mind.

image

Thanks!

ghost commented 2 years ago

Thanks for the reply.

  1. Yes i am able to reach the endpoints on my browser.
  2. I am able to download https://www.coursehero.com/file/p20cefc/D-Re-direct-behavior-by-providing-choices-or-options-for-alternative-activities/ but unable to download https://www.coursehero.com/file/80230572/Corporal-Punishment-Law-Of-Children-in-t-1docx/
  3. I reverted my python to 3.8.9 as well.
daijro commented 2 years ago

Can you run the command with --debug as a flag? Sorry I can't seem to find a way to reproduce that error

ghost commented 2 years ago

image

daijro commented 2 years ago

Thanks! I just found what's causing the issue. Fixing it right now.

ghost commented 2 years ago

Hi, the regex is better now.

However the error of changing the IP is not effective as Incapsula Firewall blocks requests basing on headers, cookies and so much more. Below is a curl request of the website i tried to connect manually. image

do you think using Selenium and Scrapy would solve this?

daijro commented 2 years ago

Hello, I had planned on using a QWebEngine, and passing the arguments to the requests session (similar to this), but I didn't think Incapsula Firewall would get in the way.

I'll be sure to add it as a fallback next time I have the chance to work on it!

ghost commented 2 years ago

Looked around on how bypass the WAF and came across Imperva_gzip_WAF_Bypass and coursehero is vulnerable to it.

image

daijro commented 2 years ago

Hi, I had no luck using this bypass. It seems to be falsely taking the 200 response code as a success:

image I think the best way for me to bypass this would have to be through pyppeteer or some other javascript web engine to run captchas. Thanks for showing me this! ❤️

abdouhl commented 2 years ago

i have the same probleme how can i fix it i use python3.10 ubuntu too

weakall0999 commented 1 year ago

got this error

image