Enable offline use - Githubissues

CicadaCinema commented 1 month ago

Fixes https://github.com/curtischong/crowdmark-downloader/issues/4

CicadaCinema commented 1 month ago

These 3 first commits of mine bring this project up to the state with which I performed my own archive on 10 July (this archive was checked thoroughly for errors/omissions), with the following exceptions:

as my driver I used webdriver.Firefox() rather than webdriver.Chrome()
knowing that driver.get() could not be reliably trusted to exit when the page finishes loading, I felt safer increasing the argument of each sleep() call to somewhere between 50 and 60 (seconds), because I can justify stepping away from the computer while the archive runs.

Now the bad news is that:

~~I have no desire to install Chrome so I will continue testing/developing locally using Firefox;~~ (seems like ungoogled chromium works for whatever reason, so ignore this)
I felt that it was more convenient to have a post-processing step in bash, since I know sed better than Python's regular expression facilities
I can see that Crowdmark has already been updated, so some changes are required as of now to make this script function properly.

CicadaCinema commented 1 month ago

Now the bad news is that:

~I have no desire to install Chrome so I will continue testing/developing locally using Firefox;~ (seems like ungoogled chromium works for whatever reason, so ignore this)

I felt that it was more convenient to have a post-processing step in bash, since I know sed better than Python's regular expression facilities

I can see that Crowdmark has already been updated, so some changes are required as of now to make this script function properly.

I've addressed these points by porting the bash code to Python and by doing a bit more testing with ungoogled chromium and the current version of Crowdmark.

Broadly, the changes in this PR are as follows:

strip out any remote resources, including some javascript
replace linked stylesheets and fonts (mathjax) with references to local files
remove any elements (such as buttons and navbars) which would not make sense in an offline archived version of an assessment - this is because the archive is intended to be browsed/organised in a file manager
find and download appropriate images and attachments; in the case of images, insert references to these local files instead of remote URLs
if the student score distribution is available, render the graphic as an image and preserve it in the html page
ensure we retry any failed network requests

curtischong commented 1 month ago

sorry been a bit busy. looking rn

curtischong commented 1 month ago

I added some more commits. but still haven't been able to get this working completely. I don't have much time to work on this. However, I'll link this PR in the original repo

curtischong / crowdmark-downloader

Enable offline use #5