gojiplus / tuber

:sweet_potato: Access YouTube from R
http://gojiplus.github.io/tuber
Other
184 stars 55 forks source link

get video id on query and how to see multiple pages? #11

Closed ElianoMarques closed 7 years ago

ElianoMarques commented 8 years ago

hey,

Trying to search for x search terms via yt_search and then for each video_id get some stats.

i know i have to use simplify = false to get the video id and with some parsing i don't think its difficult. However i can't see an option to see more than 50 and/ or go to other pages. You mention in the documentation that we can ago up until 500 but that doesn't seem to be working at all.

Can you help?

soodoku commented 8 years ago

righto --- i don't have a single shot option for getting all. currently painful with 'page_tokens'. It is on my to do list. Give me a couple of days and I should have something for you.

ElianoMarques commented 8 years ago

that would be great. In addition, the published before and after don't seem to work when we use them together. I search for a video between jan-01 and jan-02 and got all the videos until the 02 as opposite to just the ones between the day. is this a bug?

soodoku commented 8 years ago

will look into dates issue too.

soodoku commented 7 years ago

So published_after/before work. I tried:

 a <- yt_search(term="Barack Obama", published_after="2016-10-01T00:00:00Z")
 a$publishedAt
 [1] "2016-11-12T15:40:08.000Z" "2016-11-18T21:31:48.000Z" "2016-11-12T05:55:24.000Z" "2016-10-14T03:33:33.000Z" "2016-11-10T18:42:05.000Z" "2016-11-15T03:43:56.000Z" "2016-11-10T23:06:09.000Z"
 [8] "2016-10-31T02:20:22.000Z" "2016-11-02T02:22:32.000Z" "2016-10-26T02:16:22.000Z" "2016-11-11T13:12:51.000Z" "2016-10-28T02:18:35.000Z" "2016-11-01T02:22:40.000Z" "2016-11-03T02:27:18.000Z"
[15] "2016-10-27T02:16:49.000Z" "2016-10-29T02:17:38.000Z" "2016-10-30T02:18:02.000Z" "2016-11-08T16:54:27.000Z" "2016-10-17T01:05:50.000Z" "2016-10-18T03:50:24.000Z" "2016-10-23T02:15:48.000Z"
[22] "2016-11-08T17:02:34.000Z" "2016-10-20T01:26:54.000Z" "2016-11-08T17:35:35.000Z" "2016-10-22T02:15:22.000Z" "2016-10-19T04:02:25.000Z" "2016-10-25T02:14:19.000Z" "2016-10-21T01:01:16.000Z"
[29] "2016-11-10T14:53:16.000Z" "2016-11-08T16:58:25.000Z" "2016-11-08T16:56:21.000Z" "2016-11-08T17:01:31.000Z" "2016-10-24T02:14:11.000Z" "2016-11-09T16:28:22.000Z" "2016-11-09T16:36:40.000Z"
[36] "2016-11-08T17:04:17.000Z" "2016-11-09T15:54:51.000Z" "2016-10-16T00:43:30.000Z" "2016-11-10T16:16:34.000Z" "2016-11-08T16:59:25.000Z" "2016-11-10T16:21:23.000Z" "2016-11-09T16:35:03.000Z"
[43] "2016-11-11T17:33:03.000Z" "2016-11-09T16:31:03.000Z" "2016-11-08T17:34:14.000Z" "2016-11-09T16:46:55.000Z" "2016-10-24T20:14:13.000Z" "2016-11-09T16:29:17.000Z" "2016-11-08T16:53:33.000Z"
[50] "2016-11-09T16:32:55.000Z"
#Total Results 7 
a$publishedAt

#[1] "2016-01-14T21:06:45.000Z" "2016-01-28T02:09:47.000Z" "2016-01-05T15:04:11.000Z"
soodoku commented 7 years ago
a <- yt_search(term="Barack Obama", published_before = "2016-02-10T00:00:00Z", published_after="2016-01-01T00:00:00Z")
#Total Results 162 
a$publishedAt
[1] "2016-01-19T14:00:00.000Z" "2016-02-02T23:55:14.000Z" "2016-02-03T01:02:15.000Z" "2016-02-02T04:27:33.000Z" "2016-01-13T20:56:09.000Z"
soodoku commented 7 years ago

For getting all the results, I will just sketch out how to get all the results so that you have it:

For getting the page tokens you need:

a <- yt_search(term="Barack Obama", simplify=FALSE)
a$nextPageToken
b <- yt_search(term="Barack Obama", page_token = "CAUQAA", simplify=FALSE)
b$nextPageToken

And that allows you to sift through pages of results.

Will automate getting all in the next release but hth for right this moment.

ElianoMarques commented 7 years ago

thanks. giving it a try.

soodoku commented 7 years ago

Use:

install.packages("devtools")
devtools::install_github("soodoku/tuber", build_vignettes = TRUE)

before you do that.

soodoku commented 7 years ago

This happens now:

a <- yt_search(term="Barack Obama")
nrow(a)
# [1] 556

Total results initially estimated is nearly 3k. But it appears YT doesn't give more than ~500?

From: https://developers.google.com/youtube/v3/docs/search/list

Note: Search results are constrained to a maximum of 500 videos if your request specifies a value for the channelId parameter and sets the type parameter value to video, but it does not also set one of the forContentOwner, forDeveloper, or forMine filters.

Not sure really --- I iterate till basically no nextPageToken is there.