bellingcat / auto-archiver

Automatically archive links to videos, images, and social media content from Google Sheets (and more).
https://pypi.org/project/auto-archiver/
MIT License
578 stars 60 forks source link

Youtube videos only contain upload date, not time #6

Closed loganwilliams closed 1 year ago

loganwilliams commented 3 years ago

Is there another method of extracting this data that could be used to fill in the Upload timestamp column more completely?

jamesarnall commented 3 years ago

AFAIK youtube-dl doesn't provide the full time and date in the info it returns, just the date. Often, though, the file itself will contain that info in its metadata. Extracting the date and time from the file metadata might be an option. I'll do a bit of looking.

zbrasseaux commented 3 years ago

it's being done with this tool from amnesty international https://citizenevidence.amnestyusa.org/

I think it is the following snippet

$('#formInputButton').click(function(){
    if(($('#formInput').val() == $('#formInput').attr('title'))||($('#formInput').val() == '')||($('#formInput').val() == ' ')) {
        alert('Please input a url');
    } else {
        var formUrl = $('#formInput').val();
        vars = getVars(formUrl);
        var urlV = vars['v'];   

        var theUrl = 'https://www.googleapis.com/youtube/v3/videos?id='+urlV+'&part=snippet,statistics,recordingDetails&key=AIzaSyBmQcXmAHD2h5ZurlNKHvHRwMVHbBQqbvc';       
        $.getJSON(theUrl, function(data) {      
            var shortString = processShort(data,formUrl);
            $('#shortOutput').html(shortString);
        }); 
    }
});

it seems to be just an API call and JSON parsing. I will play with it and post an update

zbrasseaux commented 3 years ago

Okay, so this is it. it's pretty simple. getVars strips the URL to the video id, urlV. that is then concatenated into theUrl and then it does a GET request in getJSON to get the JSON, as the function says. it's there in items->snippet->publishedAt, which stores the datetime

edit: this is only for youtube videos, i will look into the api requests for other supported formats

zbrasseaux commented 3 years ago

I am seeing that youtubedl only returns the date, it may need to be something changed on their end, but it's definitely possible. alternatively, this can be used as a bandaid

loganwilliams commented 3 years ago

@zbrasseaux Thank you, this is a nice find! There are two approaches here, we could either:

The first option is definitely preferable from my perspective, especially since there will likely be a large refactor of auto-archiver soon. If you have interest in tackling this, that would be very welcomed!

zbrasseaux commented 3 years ago

@loganwilliams I agree that the first option is better. I have a fork of youtube_dl and am working on it locally. I'm curious how auto-archiver would use a fork of youtube_dl, or would this be contingent on the youtube_dl devs approving a PR?

I just saw that with the CLI version of youtube_dl, you can get a timestamp value. I posted a question in their issues-questions section asking where they get this value and how to access it, so hopefully that is pretty straightforward. In the meantime, i will keep trying to implement API v3

loganwilliams commented 3 years ago

@zbrasseaux That's great to hear! I think it might take some time for a youtube_dl PR to be approved, as there are over 800 pending on the project at the moment. I could just modify this project's pipenv requirements to install youtube_dl from a fork. There are some downsides to this (mainly remaining diligent about rebasing the fork off of master to make sure it does not fall far behind), but I think it's the best option.

zbrasseaux commented 2 years ago

apparently they have a separate function for getting timestamps (in unix format, i think)

https://github.com/ytdl-org/youtube-dl/issues/30263

I will implement this tonight and make a PR

edit: nevermind, this just returns 18:00:00 once you convert it every time, i've tried it with like 4 different links now

loganwilliams commented 2 years ago

🤦‍♂️

The API v3 approach still seems promising though, thank you for your effort on this.