Closed Benjamin-Loison closed 1 year ago
As I have more experience now, this feature should be doable.
Could clean this endpoint but it is now working which is the most important point. Items to improve are notably:
snippet/topLevelComment/snippet
not making sense when scraping answers (should see how CommentThreads: list does)resultsPerPage
is incorrect when scraping answers (should be 10)totalResults
isn't available when using pageToken
- only first relevance
results contains the totalResults
, not even time
or following results of relevance
neither time
contain the totalResults
Used following Python script to verify the correctness of this endpoint for the 593 claimed comments, with:
order=time
we retrieve 592 commentsorder=relevance
we retrieve 505 commentsimport requests, json
VIDEO_ID = 'nxaNgiu3ob0'
ORDER = 'time'
commentIds = set()
commonUrl = f'https://localhost/YouTube-operational-API/commentThreads?part=snippet,replies&videoId={VIDEO_ID}&order={ORDER}'
pageToken = ''
while True:
url = commonUrl
if pageToken != '':
url += f'&pageToken={pageToken}'
content = requests.get(url, verify=False).text
data = json.loads(content)
for item in data['items']:
commentIds.add(item['id'])
repliesPageToken = item['snippet']['topLevelComment']['snippet']['nextPageToken']
while repliesPageToken != '':
repliesUrl = commonUrl + f'&pageToken={repliesPageToken}'
repliesContent = requests.get(repliesUrl, verify=False).text
repliesData = json.loads(repliesContent)
for item in repliesData['items']:
commentIds.add(item['id'])
print(len(commentIds))
if 'nextPageToken' in repliesData:
repliesPageToken = repliesData['nextPageToken']
else:
break
print(len(commentIds))
if 'nextPageToken' in data:
pageToken = data['nextPageToken']
else:
break
print(len(commentIds))
Asked on StackOverflow: https://stackoverflow.com/q/71186488
Are such comments pinned? That way when see first level comment that isn't meeting the criteria we can stop our search?