Open ghost opened 2 years ago
Hello, thank you for bringing attention to this! Sadly, I don't believe it is possible to scrape information from these endpoints.
On https://www.coursehero.com/file/*
links, the pages are hosted as blurred images that are split up into unblurred previews. This tool gathers unblurred parts of each page hosted on CourseHero servers and rebuilds the document behind the paywall.
In tutor-problems
endpoints, the previews shown on CourseHero are actually randomly generated blurred text:
From what I could tell, there wasn't any way for me to gather any previews or split segments of the original answer. The only way to access the information behind the paywall would have to be using a CourseHero premium account token :(
Thank you so much for using my tool!!! I'm glad to see people using it. Currently working on a major update!!!
Thank you for your update. We shall wait for the update.
Using the same IP i was able to download one homework solution and the other raised an error. what could be the issue. I even tried multiple NordVPN IPs but the same kept happening
Just released a fix, hopefully it works now
still giving the same error..another thing to mention is that i am using python 3.10.4 on ubuntu
Hello, sorry for the late response. There were a few questions I'd like to ask:
Are you able to reach this endpoint in a browser? If you are, I'll need to fix my request headers.
Does it only fail on this specific CourseHero link (https://www.coursehero.com/file/p20cefc/D-Re-direct-behavior-by-providing-choices-or-options-for-alternative-activities/) or all of them? Do any other CourseHero links fail?
I was able to successfully run this using Python 3.8.9 on Windows shown below (I wasn't able to reproduce your error). Perhaps the version of Python you are running could be the issue? This script wasn't built with Python 3.10 compatibility in mind.
Thanks!
Thanks for the reply.
https://www.coursehero.com/file/p20cefc/D-Re-direct-behavior-by-providing-choices-or-options-for-alternative-activities/
but unable to download https://www.coursehero.com/file/80230572/Corporal-Punishment-Law-Of-Children-in-t-1docx/
Can you run the command with --debug as a flag? Sorry I can't seem to find a way to reproduce that error
Thanks! I just found what's causing the issue. Fixing it right now.
Hi, the regex is better now.
However the error of changing the IP is not effective as Incapsula Firewall blocks requests basing on headers, cookies and so much more. Below is a curl request of the website i tried to connect manually.
do you think using Selenium and Scrapy would solve this?
Hello, I had planned on using a QWebEngine, and passing the arguments to the requests session (similar to this), but I didn't think Incapsula Firewall would get in the way.
I'll be sure to add it as a fallback next time I have the chance to work on it!
Looked around on how bypass the WAF and came across Imperva_gzip_WAF_Bypass and coursehero is vulnerable to it.
Hi, I had no luck using this bypass. It seems to be falsely taking the 200 response code as a success:
I think the best way for me to bypass this would have to be through pyppeteer or some other javascript web engine to run captchas.
Thanks for showing me this! ❤️
i have the same probleme how can i fix it i use python3.10 ubuntu too
got this error
Thanks for the tool. ❤❤
Can you add support to download files from the
https://www.coursehero.com/tutors-problems/*
endpoint. eg. the tool can downloadhttps://www.coursehero.com/file/61519475/Human-Services-Assignment-1-docx/
but nothttps://www.coursehero.com/tutors-problems/Social-Psychology/38685580-What-is-a-civic-professional-in-relation-to-the-Human-Service/