Closed lucienshawls closed 2 weeks ago
I considered that option. The situation is like this: the URL for a module (taking AMCS2033 American Radicalism as an example) is like this
https://campus.nottingham.ac.uk/psc/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=American%20Radicalism&MODULE=AMCS2033&CRSEID=016489&LINKA=&LINKB=&LINKC=UDD-ACS
where there is this parameter CRSEID=016489
that we need to obtain. Unfortunately this does not seem trivial. This CRSEID
is not fetched when the list of modules (like here) is loaded; rather, it is fetched after you click on that module in the list, and then a POST request is sent with the following data (converted from CURL command thanks to https://curlconverter.com/):
import requests
cookies = {} # Omitted
headers = {} # Omitted
data = {
'ICAJAX': '1',
'ICNAVTYPEDROPDOWN': '0',
'ICType': 'Panel',
'ICElementNum': '0',
'ICStateNum': '18',
'ICAction': 'ADDRESS_LINK$0',
'ICModelCancel': '0',
'ICXPos': '0',
'ICYPos': '0',
'ResponsetoDiffFrame': '-1',
'TargetFrameName': 'None',
'FacetPath': 'None',
'ICFocus': '',
'ICSaveWarningFilter': '0',
'ICChanged': '0',
'ICSkipPending': '0',
'ICAutoSave': '0',
'ICResubmit': '0',
'ICSID': 'LSvvCKSCxe6C7js/87aD51fizvCE3qaSdjvS3zDAogo=',
'ICActionPrompt': 'false',
'ICTypeAheadID': '',
'ICBcDomData': 'C~UnknownValue~EMPLOYEE~HRMS~UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL~UN_CRS_EXT2_FPG~Curriculum Catalogue~UnknownValue~UnknownValue~https://campus.nottingham.ac.uk/psc/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=&Module=&SCHOOL=UDD-ACS&LINKA=~UnknownValue',
'ICDNDSrc': '',
'ICPanelHelpUrl': '',
'ICPanelName': '',
'ICPanelControlStyle': 'pst_side2-hidden pst_panel-mode ',
'ICFind': '',
'ICAddCount': '',
'ICAppClsData': '',
'win0hdrdivPT_SYSACT_RETLST': 'psc_hidden',
'win0hdrdivPT_SYSACT_HELP': 'psc_hidden',
}
response = requests.post(
'https://campus.nottingham.ac.uk/psc/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL',
cookies=cookies,
headers=headers,
data=data,
)
where data
is pretty complicated. Now it is actually possible to obtain the CRSEID
of the modules belonging to the same school by modifying the data
fields ICStateNum
and ICAction
: for the former you need to increment it by one every time, and for the latter the number after ADDRESS_LINK$
corresponds to the index of the module of interest in the module list. So far so good. However, it is not clear yet to me how to obtain the CRSEID
for modules of other schools. Naturally, you'd think that you should change ICBcDomData
, which seems to record all the pages you have visited (by observing a few such POST
requests). Further, you'd think you should change the SCHOOL=
field there so as to trick the server into thinking that your last visited page is the module list of the school of interest. And being very careful, you'd also modify the SCHOOL
parameters of the Referer
field in the headers. But after all these, the request wouldn't give you the right data; the data for the old school is returned, if anything. I doubt that the server remembers the pages you visited through Cookies or other intricate means that I have no intention to figure out, and thus it appears to me that selenium
is the quickest way to get things done.
That said, even with the knowledge above, it is possible to make the crawler (much?) faster. For each school, we still need to use Selenium to get a 'prototype' request like above by clicking on the first module of the school; but then we can obtain other modules using requests
only by modifying the request data in the way stated above. But then the Selenium dependency is still there, so not a perfect solution.
Or maybe we want to use requests.Session()
to help us handle the Cookies. That may work but I haven't investigated.
The following code doesn't work; post_data.html
reads <title>Campus System Requires Cookies</title>
. We do have Cookies sent, as is printed out. Also list.html
is as expected. Not really sure why this is the case, maybe we need to change ICSID
field as well, but I have no idea where that comes from. Anyway it seems a hard problem not to rely on Selenium. I'll leave my discoveries here in case anyone wants to tackle the issue and find them useful.
import requests
from requests.utils import dict_from_cookiejar
from pprint import pprint
# Example headers to mimic a browser
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36',
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1'
}
SCHOOL = 'USC-ENGL'
session = requests.Session()
response = session.get(f'https://campus.nottingham.ac.uk/psc/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT2_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=&Module=&SCHOOL={SCHOOL}&LINKA=&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=&Module=&SCHOOL={SCHOOL}', headers=headers)
with open("list.html", "w") as f:
f.write(response.text)
data = {
'ICAJAX': '1',
'ICNAVTYPEDROPDOWN': '0',
'ICType': 'Panel',
'ICElementNum': '0',
'ICStateNum': '1',
'ICAction': 'ADDRESS_LINK$1',
'ICModelCancel': '0',
'ICXPos': '0',
'ICYPos': '0',
'ResponsetoDiffFrame': '-1',
'TargetFrameName': 'None',
'FacetPath': 'None',
'ICFocus': '',
'ICSaveWarningFilter': '0',
'ICChanged': '0',
'ICSkipPending': '0',
'ICAutoSave': '0',
'ICResubmit': '0',
'ICSID': 'LSvvCKSCxe6C7js/87aD51fizvCE3qaSdjvS3zDAogo=',
'ICActionPrompt': 'false',
'ICTypeAheadID': '',
'ICBcDomData': f'C~UnknownValue~EMPLOYEE~HRMS~UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL~UN_CRS_EXT2_FPG~Curriculum Catalogue~UnknownValue~UnknownValue~https://campus.nottingham.ac.uk/psc/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=&Module=&SCHOOL={SCHOOL}&LINKA=~UnknownValue',
'ICDNDSrc': '',
'ICPanelHelpUrl': '',
'ICPanelName': '',
'ICPanelControlStyle': 'pst_side2-hidden pst_panel-mode ',
'ICFind': '',
'ICAddCount': '',
'ICAppClsData': '',
'win0hdrdivPT_SYSACT_RETLST': 'psc_hidden',
'win0hdrdivPT_SYSACT_HELP': 'psc_hidden',
}
pprint(dict_from_cookiejar(session.cookies))
response = session.post(
'https://campus.nottingham.ac.uk/psc/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL',
data=data,
)
with open("post_data.html", "w", encoding='utf-8') as f:
f.write(response.text)
In the list.html
, there is ICSID
and some other variables:
<input type='hidden' name='ICSID' id='ICSID' value='DgIumPq0ZvYZTuW2qjmdLIZ+67qu0xEV2gFAe94Zyxc=' />
So I use bs4 to obtain the ICSID
from the first request (that gets list.html
), and then send the second request with the specified ICSID
:
from bs4 import BeautifulSoup
soup = BeautifulSoup(response.text, "html.parser")
element = soup.find(id="ICSID")
ICSID = element.get("value")
# print(ICSID)
data = {..., "ICSID": ICSID, ...} # The rest remains the same with your code
and get something new (not <title>Campus System Requires Cookies</title>
). See if it helps.
Very cool! I think we get the right thing.
There is a line in the returned document:
processing_win0(0,3000);]]></GENSCRIPT><GENSCRIPT id='onloadScript'><![CDATA[document.location='/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Processing%20Sentences%20and%20Discourse&MODULE=ENGL4387&CRSEID=035633&LINKA=&LINKB=&LINKC=USC-ENGL';]]></GENSCRIPT>
which contains the link we need. It's just a matter of regex to get it.
Thanks a lot for the help!
That is great!
But, just being curious, why am i getting CRSEID=035632
while you got CRSEID=035633
? I did not modify anything in the data
except ICSID
. Did you use different data
? If not, something strange is happening.
Sorry for the confusion. I modified the index just to test it and I copied the modified result.
That is OK.
I tried to eliminate as many keys in data
as possible. According to your explanations and the tests we have been through, I kept the following keys without trying to remove them: ICStateNum
, ICAction
and ICSID
. I applied the binary search and got the results:
ICAJAX
, all other keys can be deleted, and the response remains the same;ICAJAX
, it can be deleted as well, but the response will contain TWO SIMILAR (not the same) url links (with the same CRSEID
of course), this needs your further inspection.Code here:
import requests
from requests.utils import dict_from_cookiejar
from pprint import pprint
from bs4 import BeautifulSoup
# Example headers to mimic a browser
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1",
}
SCHOOL = "USC-ENGL"
session = requests.Session()
response = session.get(
f"https://campus.nottingham.ac.uk/psc/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT2_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=&Module=&SCHOOL={SCHOOL}&LINKA=&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=&Module=&SCHOOL={SCHOOL}",
headers=headers,
)
with open("list.html", "w") as f:
f.write(response.text)
soup = BeautifulSoup(response.text, "html.parser")
element = soup.find(id="ICSID")
ICSID = element.get("value")
data = {
"ICAJAX": "1",
"ICStateNum": "1",
"ICAction": "ADDRESS_LINK$1",
"ICSID": ICSID,
}
pprint(dict_from_cookiejar(session.cookies))
response = session.post(
"https://campus.nottingham.ac.uk/psc/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL",
data=data,
)
with open("post_data.html", "w", encoding="utf-8") as f:
f.write(response.text)
Thanks for the effort, but I don't really mind leaving the fields as-is to make the crawler look more similar to an actual user
Yes, that is always a good idea. So I assume this issue can be safely closed. :)
Not yet, I'll close it when I update the crawler to remove Selenium. There's still work to do, namely how to use concurrency to get the module links in a more efficient way. Still working... 👨🔧
Turns out, a session CAN be reused. Once a session is established and ICSID
is acquired, you can get all module url with the same session and ICSID
. What's more, concurrency works fine.
Oh, by the way, ICStateNum
can also be removed. If you insist that it should not be removed, there is actually no need to increase it by one each time. Just set it to "1"
and everything is fine.
The example module page that I used: Curriculum Catalogue
Code with concurrency:
import re
import requests
import concurrent.futures
from bs4 import BeautifulSoup
HOST = "https://campus.nottingham.ac.uk"
HEADERS = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1",
}
def init_session():
session = requests.Session()
url = f"{HOST}/psc/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL"
params = {
"PAGE": "UN_CRS_EXT2_FPG",
"CAMPUS": "U",
"TYPE": "Module",
"YEAR": "2024",
"TITLE": "",
"Module": "",
"SCHOOL": "UDD-ACS",
"LINKA": "",
}
# https://campus.nottingham.ac.uk/psc/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT2_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=&Module=&SCHOOL=UDD-ACS&LINKA=
response = session.get(
url=url,
params=params,
headers=HEADERS,
)
soup = BeautifulSoup(response.text, "html.parser")
icsid_entry = soup.find(id="ICSID")
icsid = icsid_entry.get("value")
rows = soup.find_all("tr", class_="ps_grid-row")
return {"session": session, "ICSID": icsid, "total": len(rows)}
def get_module_url(session: requests.Session, icsid: str, index: int):
data = {
"ICAJAX": "1",
"ICStateNum": "1",
"ICAction": f"ADDRESS_LINK${str(index)}",
"ICSID": icsid,
}
response = session.post(
f"{HOST}/psc/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL",
data=data,
)
re_obj = re.search(r"document.location='(.*)';", response.text)
if re_obj:
module_url = f"{HOST}{re_obj.group(1)}"
else:
module_url = ""
# raise RuntimeError("URL not found")
return module_url
def get_module_list():
cred = init_session()
session: requests.Session = cred["session"]
icsid: str = cred["ICSID"]
total: int = cred["total"]
module_list = [None for _ in range(total)]
with concurrent.futures.ThreadPoolExecutor() as executor:
future_to_index = {
executor.submit(
get_module_url,
session=session,
icsid=icsid,
index=index,
): index
for index in range(cred["total"])
}
for future in concurrent.futures.as_completed(future_to_index):
index = future_to_index[future]
module_url = future.result()
print(f"Module {index} URL: {module_url}")
module_list[index] = module_url
session.close()
return module_list
import time
start = time.time()
module_list = get_module_list()
end = time.time()
print(f"Time taken: {end - start:.2f}s")
with open("module_list.txt", "w", encoding="utf-8") as f:
for module in module_list:
f.write(f"{module}\n")
Output:
Module 0 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Prohibition%20America%20(UG%2020%20credits)&MODULE=AMCS3024&CRSEID=013585&LINKA=&LINKB=&LINKC=UDD-ACS
Module 5 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Race,%20Power,%20Money%20and%20the%20Making%20of%20North%20America%201607%20-%201900&MODULE=AMCS1001&CRSEID=010244&LINKA=&LINKB=&LINKC=UDD-ACS
Module 11 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Approaches%20to%20Contemporary%20American%20Culture%201:%20An%20Introduction&MODULE=AMCS1030&CRSEID=018020&LINKA=&LINKB=&LINKC=UDD-ACS
Module 3 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Popular%20Music%20Cultures%20and%20Countercultures&MODULE=AMCS3045&CRSEID=015501&LINKA=&LINKB=&LINKC=UDD-ACS
Module 8 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Approaches%20to%20Contemporary%20American%20Culture%202:%20Developing%20Themes%20and%20Perspectives&MODULE=AMCS1031&CRSEID=017127&LINKA=&LINKB=&LINKC=UDD-ACS
Module 4 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=American%20Radicalism&MODULE=AMCS2033&CRSEID=016489&LINKA=&LINKB=&LINKC=UDD-ACS
Module 12 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=African%20American%20History%20and%20Culture&MODULE=AMCS2052&CRSEID=018148&LINKA=&LINKB=&LINKC=UDD-ACS
Module 10 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=American%20Literature%20and%20Culture%202:%20Since%201940&MODULE=AMCS1011&CRSEID=010270&LINKA=&LINKB=&LINKC=UDD-ACS
Module 1 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Dissertation%20in%20American%20and%20Canadian%20Studies&MODULE=AMCS3004&CRSEID=008334&LINKA=&LINKB=&LINKC=UDD-ACS
Module 2 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Dissertation%20in%20American%20and%20Canadian%20Studies&MODULE=AMCS3006&CRSEID=008532&LINKA=&LINKB=&LINKC=UDD-ACS
Module 7 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=American%20Freedom%20Empire,%20Rights,%20and%20Capitalism%20in%20Modern%20US%20History,%201900-Present&MODULE=AMCS1009&CRSEID=010269&LINKA=&LINKB=&LINKC=UDD-ACS
Module 17 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=American%20Magazine%20Culture:%20Journalism,%20Advertising%20and%20Fiction%20from%20Independence%20to%20the%20Internet%20Age&MODULE=AMCS3069&CRSEID=031751&LINKA=&LINKB=&LINKC=UDD-ACS
Module 23 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=MRes%20Research%20Skills%201&MODULE=AMCS4082&CRSEID=033599&LINKA=&LINKB=&LINKC=UDD-ACS
Module 20 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Popular%20Music%20Cultures%20&%20Countercultures%20(PGT%2020)&MODULE=AMCS4070&CRSEID=031971&LINKA=&LINKB=&LINKC=UDD-ACS
Module 16 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Contemporary%20North%20American%20Fiction&MODULE=AMCS2056&CRSEID=030744&LINKA=&LINKB=&LINKC=UDD-ACS
Module 26 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=US%20Foreign%20Policy,%201989-Present%20(UG%20-%2020%20credits)&MODULE=AMCS3025&CRSEID=013584&LINKA=&LINKB=&LINKC=UDD-ACS
Module 6 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=American%20Literature%20and%20Culture%201:%201830-1940&MODULE=AMCS1005&CRSEID=010246&LINKA=&LINKB=&LINKC=UDD-ACS
Module 15 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Sexuality%20in%20American%20History%20(Level%203)&MODULE=AMCS3061&CRSEID=022814&LINKA=&LINKB=&LINKC=UDD-ACS
Module 13 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=North%20American%20Regions&MODULE=AMCS2054&CRSEID=022802&LINKA=&LINKB=&LINKC=UDD-ACS
Module 19 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Immigration%20and%20Ethnicity%20in%20the%20United%20States&MODULE=AMCS2007&CRSEID=011262&LINKA=&LINKB=&LINKC=UDD-ACS
Module 18 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=North%20American%20Film%20Adaptations%20(Level%203)&MODULE=AMCS3068&CRSEID=031902&LINKA=&LINKB=&LINKC=UDD-ACS
Module 25 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=The%20CIA%20and%20US%20Foreign%20Policy,%201945-2012&MODULE=AMCS2058&CRSEID=034740&LINKA=&LINKB=&LINKC=UDD-ACS
Module 9 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=The%20US%20&%20the%20World%20in%20the%20American%20Century:%20US%20Foreign%20Policy,%201898-2008&MODULE=AMCS2048&CRSEID=017243&LINKA=&LINKB=&LINKC=UDD-ACS
Module 24 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Troubled%20Empire:%20The%20Projection%20of%20American%20Global%20Power%20from%20Pearl%20Harbor%20to%20Covid-19&MODULE=AMCS3074&CRSEID=034232&LINKA=&LINKB=&LINKC=UDD-ACS
Module 14 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Key%20Texts%20in%20American%20Social%20and%20Political%20Thought&MODULE=AMCS2055&CRSEID=022803&LINKA=&LINKB=&LINKC=UDD-ACS
Module 21 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=From%20Landscapes%20to%20Mixtapes:%20Canadian%20Literature,%20Film%20and%20Culture&MODULE=AMCS1008&CRSEID=011286&LINKA=&LINKB=&LINKC=UDD-ACS
Module 22 URL: https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Varieties%20of%20Classic%20American%20Film,%20Television%20and%20Literature%20Since%201950&MODULE=AMCS3071&CRSEID=033030&LINKA=&LINKB=&LINKC=UDD-ACS
Time taken: 8.07s
Temporary file module_list.txt
(correct order):
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Prohibition%20America%20(UG%2020%20credits)&MODULE=AMCS3024&CRSEID=013585&LINKA=&LINKB=&LINKC=UDD-ACS
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Dissertation%20in%20American%20and%20Canadian%20Studies&MODULE=AMCS3004&CRSEID=008334&LINKA=&LINKB=&LINKC=UDD-ACS
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Dissertation%20in%20American%20and%20Canadian%20Studies&MODULE=AMCS3006&CRSEID=008532&LINKA=&LINKB=&LINKC=UDD-ACS
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Popular%20Music%20Cultures%20and%20Countercultures&MODULE=AMCS3045&CRSEID=015501&LINKA=&LINKB=&LINKC=UDD-ACS
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=American%20Radicalism&MODULE=AMCS2033&CRSEID=016489&LINKA=&LINKB=&LINKC=UDD-ACS
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Race,%20Power,%20Money%20and%20the%20Making%20of%20North%20America%201607%20-%201900&MODULE=AMCS1001&CRSEID=010244&LINKA=&LINKB=&LINKC=UDD-ACS
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=American%20Literature%20and%20Culture%201:%201830-1940&MODULE=AMCS1005&CRSEID=010246&LINKA=&LINKB=&LINKC=UDD-ACS
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=American%20Freedom%20Empire,%20Rights,%20and%20Capitalism%20in%20Modern%20US%20History,%201900-Present&MODULE=AMCS1009&CRSEID=010269&LINKA=&LINKB=&LINKC=UDD-ACS
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Approaches%20to%20Contemporary%20American%20Culture%202:%20Developing%20Themes%20and%20Perspectives&MODULE=AMCS1031&CRSEID=017127&LINKA=&LINKB=&LINKC=UDD-ACS
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=The%20US%20&%20the%20World%20in%20the%20American%20Century:%20US%20Foreign%20Policy,%201898-2008&MODULE=AMCS2048&CRSEID=017243&LINKA=&LINKB=&LINKC=UDD-ACS
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=American%20Literature%20and%20Culture%202:%20Since%201940&MODULE=AMCS1011&CRSEID=010270&LINKA=&LINKB=&LINKC=UDD-ACS
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Approaches%20to%20Contemporary%20American%20Culture%201:%20An%20Introduction&MODULE=AMCS1030&CRSEID=018020&LINKA=&LINKB=&LINKC=UDD-ACS
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=African%20American%20History%20and%20Culture&MODULE=AMCS2052&CRSEID=018148&LINKA=&LINKB=&LINKC=UDD-ACS
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=North%20American%20Regions&MODULE=AMCS2054&CRSEID=022802&LINKA=&LINKB=&LINKC=UDD-ACS
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Key%20Texts%20in%20American%20Social%20and%20Political%20Thought&MODULE=AMCS2055&CRSEID=022803&LINKA=&LINKB=&LINKC=UDD-ACS
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Sexuality%20in%20American%20History%20(Level%203)&MODULE=AMCS3061&CRSEID=022814&LINKA=&LINKB=&LINKC=UDD-ACS
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Contemporary%20North%20American%20Fiction&MODULE=AMCS2056&CRSEID=030744&LINKA=&LINKB=&LINKC=UDD-ACS
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=American%20Magazine%20Culture:%20Journalism,%20Advertising%20and%20Fiction%20from%20Independence%20to%20the%20Internet%20Age&MODULE=AMCS3069&CRSEID=031751&LINKA=&LINKB=&LINKC=UDD-ACS
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=North%20American%20Film%20Adaptations%20(Level%203)&MODULE=AMCS3068&CRSEID=031902&LINKA=&LINKB=&LINKC=UDD-ACS
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Immigration%20and%20Ethnicity%20in%20the%20United%20States&MODULE=AMCS2007&CRSEID=011262&LINKA=&LINKB=&LINKC=UDD-ACS
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Popular%20Music%20Cultures%20&%20Countercultures%20(PGT%2020)&MODULE=AMCS4070&CRSEID=031971&LINKA=&LINKB=&LINKC=UDD-ACS
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=From%20Landscapes%20to%20Mixtapes:%20Canadian%20Literature,%20Film%20and%20Culture&MODULE=AMCS1008&CRSEID=011286&LINKA=&LINKB=&LINKC=UDD-ACS
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Varieties%20of%20Classic%20American%20Film,%20Television%20and%20Literature%20Since%201950&MODULE=AMCS3071&CRSEID=033030&LINKA=&LINKB=&LINKC=UDD-ACS
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=MRes%20Research%20Skills%201&MODULE=AMCS4082&CRSEID=033599&LINKA=&LINKB=&LINKC=UDD-ACS
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Troubled%20Empire:%20The%20Projection%20of%20American%20Global%20Power%20from%20Pearl%20Harbor%20to%20Covid-19&MODULE=AMCS3074&CRSEID=034232&LINKA=&LINKB=&LINKC=UDD-ACS
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=The%20CIA%20and%20US%20Foreign%20Policy,%201945-2012&MODULE=AMCS2058&CRSEID=034740&LINKA=&LINKB=&LINKC=UDD-ACS
https://campus.nottingham.ac.uk/psp/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=US%20Foreign%20Policy,%201989-Present%20(UG%20-%2020%20credits)&MODULE=AMCS3025&CRSEID=013584&LINKA=&LINKB=&LINKC=UDD-ACS
Thanks for those brilliant ideas @lucienshawls ! Now the multi-thread structure is used in the new module crawler (see here), which is way faster and stabler than the Selenium-based one.
I wonder if
module/fetch_modules.py
could userequests
rather thanselenium
, because the latter significantly reduces efficiency, and requires complicated configurations.