coderefinery / video-processing

Processed videos from CodeRefinery (and the workspace while creating them)
3 stars 2 forks source link

Video processing files

This repo has the files used to do our video processing. It uses git-annex for the big files and other things are committed to git. It provides non-YouTube public access to our videos, and is also our working place for releasing videos (so a lot of the instructions below are for those that help processing them).

We also made a description of git-annex for data management, targeted to scientists and researchers, if you want to know what's going on behind the scenes.

What is available here?

Browse the repo - course links are below. More can be added later depending on demand.

Getting public copies of videos from git-annex

Raw video data is stored using git-annex and synced around different places (our HPC cluster, the computers that process the videos, the object store Allas provided by CSC). Allas allows you to download the videos you might like:

$ git clone https://github.com/coderefinery/video-processing/
$ git annex get python-for-scicomp-2023/out/day1.1-icebreaker.mkv
get python-for-scicomp-2023/out/day1.1-icebreaker.mkv (from allas...)

Only processed videos are available to the general public (the raw private ones are recorded with git-annex in this repo, but not available for download). Also, this is a test setup and everything may be subject to change or depreciation.

(How was this set up? Get the environment variables needed for the git-annex S3 special remote - I did this by running allas_conf on one of the CSC computers. Then run git annex initremote allas type=S3 encryption=none chunk=50MiB embedcreds=no host=a3s.fi protocol=https bucket=aaltoscicomp-video publicurl=https://aaltoscicomp-video.a3s.fi/ fileprefix=1- public=yes autoenable=true - it caches the authentication locally on that computer only, it doesn't spread to anywhere else.)

How it works

This repository stores the stuff used to process videos for CodeRefinery / Aalto Scientific Computing / etc(?). Here's how it works in general:

Subtitle editing

If you are helping with subtitle editing:

Slicing the videos

If you are volunteering to help generate the edit list:

git-annex setup for private video files

Raw videos files are private and only synced via our cluster.

Only do this if you are pulling the private (raw) big video files to your own computer to view them: otherwise, you can use git normally and the video files appear as broken symbolic links. For the final videos, you can get them using the public copy above.

Privacy notice: the git-annex info on which computers have which files get publicly distributed through the repository (including through Github). The info about your computer is the UUID and the MY-COMPUTER-NAME which is in the repo.

To set up this repo to connect to the Triton cluster:

(pull repo from github)
git remote add triton triton.aalto.fi:/scratch/scicomp/video-processing/
git config remote.triton.annex-shell /share/apps/git-annex/10.20230228.path/git-annex-shell
git annex init MY-COMPUTER-NAME  # set up git-annex
git annex wanted . present       # don't download everything, but keep what is here
git annex sync
git annex get python-for-scicomp/2023/raw/FILE.mkv