Closed darthnithin closed 3 years ago
print(work.comments)
works though, it returns the number of comments
Could you better format your pasted code for readability please?
The code is the same as the example here, but it doesn't work even with a different work id.
from time import time
import bs4
import requests
import AO3
url = "https://archiveofourown.org/works/20125552/chapters/47677465"
workid = AO3.utils.workid_from_url(url)
work = AO3.Work(workid);
work.load_chapters()
start = time()
comments = work.get_comments(1, 5)
print(f"Loaded {len(comments)} comment threads in {round(time()-start, 1)} seconds\n")
for comment in comments:
print(f"Comment ID: {comment.comment_id}\nReplies: {len(comment.get_thread())}")
Thank you, I'm getting the same error on my machine too with the same message.
Probably requires @ArmindoFlores to take a look
I'm pretty sure its erroring here:
string = "work_id" if self.oneshot else "chapter_id"
url = f"https://archiveofourown.org/comments/show_comments?page=%d&{string}={chapter_id}"
soup = self.request(url%1)
pages = 0
div = soup.find("div", {"id": "comments_placeholder"})
ol = div.find("ol", {"class": "pagination actions"})
if ol is None:
pages = 1
else:
for li in ol.findAll("li"):
if li.getText().isdigit():
pages = int(li.getText())
comments = []
I have confirmed this is an issue. I suspect something might have changed on AO3, but I'll look through this project's commit history in case I accidentally changed anything that broke this.
Well I just made an AO3 fic comment scraper myself... I learned python for this lol. Also, I am getting the same error I think that it is erroring if a page doesn't have comments, meaning that the div wouldn't exist. Also possible that i'm being rate limited
from bs4 import BeautifulSoup
import requests
import numpy as np
import re
import array as arr
import json
data = {}
data['comments'] = []
chapid = 64257376
workid = 20125552
pagenumber = 1
page = True
comarr = []
chapterids = arr.array('I', [])
nav = f'https://archiveofourown.org/works/{workid}/navigate'
htmldoc = requests.request('get', nav).text
soup = BeautifulSoup(htmldoc, 'html.parser')
soup.prettify()
reg = re.compile(r'/works/\d{8}/chapters/(\d{8})')
for link in soup.find_all('a'):
hrefi = link.get('href')
cid = re.findall(reg, hrefi)
if cid:
chapterids.append(int(cid[0]))
for x in chapterids:
while page:
url = f'https://archiveofourown.org/chapters/{x}?page={pagenumber}&show_comments=true&view_adult=true#comments'
htmldoc = requests.request('get', url).text
soup = BeautifulSoup(htmldoc, 'html.parser')
soup.prettify()
div = soup.find("div", {"id": "comments_placeholder"})
print(url)
if div:
ol = div.find("ol", {"class": "pagination actions"})
comment = div.find_all(class_ ="userstuff")
comment = div.find_all("blockquote", {"class": "userstuff"})
i = 0
for each in comment:
ab = str(each.get_text())
comarr.append(ab)
data['comments'].append({
'chapterid': x,
'commenttext' : ab
})
i += 1
else:
print("DIV is nonetype")
if ol:
page = ol.find('a', {'rel': 'next'})
if page:
pagenumber += 1
continue
else:
break
#print(pagenumber)
#print(url)
# print(comment.get_text())
# print(comment)
# print(comarr[2])
print(pagenumber)
with open('data.txt', 'w') as outfile:
json.dump(data, outfile)
for y in comarr:
print(y)
Yup i'm being rate limited
AO3 will throw an HTTPError if it gets rate-limited. Also, these changed are implemented on the new 2.0.4 version only.
Indeed it did that. I just was ignoring it lol...
No matter what I do, I can't get
work.get_comments()
to work. Code:from time import time import bs4 import requests import AO3 work = AO3.Work(24560008) work.load_chapters() start = time() comments = work.get_comments(1, 5) print(f"Loaded {len(comments)} comment threads in {round(time()-start, 1)} seconds\n") for comment in comments: print(f"Comment ID: {comment.comment_id}\nReplies: {len(comment.get_thread())}")
Error:Traceback (most recent call last): File ".\ao3.py", line 8, in <module> comments = work.get_comments(1, 5) File "C:\Users\nithi\AppData\Local\Programs\Python\Python38-32\lib\site-packages\AO3\works.py", line 272, in get_comments ol = div.find("ol", {"class": "pagination actions"}) AttributeError: 'NoneType' object has no attribute 'find'