Signbank / Global-signbank

An online sign dictionary and sign database management system for research purposes. Developed originally by Steve Cassidy/ This repo is a fork for the Dutch version, previously called 'NGT-Signbank'.
http://signbank.cls.ru.nl
BSD 3-Clause "New" or "Revised" License
19 stars 12 forks source link

Implement video tests #1129

Open susanodd opened 7 months ago

susanodd commented 7 months ago

include set of sample videos in test_data/videos

Implement tests for all the different cases where a user can upload a video

susanodd commented 7 months ago

It needs to be possible to upload a file that somebody recorded on an Apple device. Especially since it doesn't work in the browsers on Apple. (No access to the camera.) I have a mov file recorded on my macbook using the camera. But this only works partially to upload it. It's not possible to make a small video without lines in it.

We need tests on actual video files to help debug the code.

susanodd commented 6 months ago

@Jetske while working on the new Dataset Media view (#1172 ) I'm looking at the already existing code for importing videos.

The really old method save_media in tools.py actually just moves the uploaded video file to the gloss's videos. It does not do anything with GlossVideo. So I guess it would just replace a video file. (It is allowed to overwrite.) (This is a not very well obscure undocumented command for importing a bunch of videos by putting them in a particular folder. It was used in the past to batch upload videos via the file system.)

I'm busy to revise this code so it can be integrated in the new view to manage media. There will also be corresponding api functions.

So I'm wondering what to do about just "putting" the videos in the gloss, versus invoking the actual methods that make backups and version management.

It's possible to just put the video there to replace an existing one. And if there is no video yet, use the proper methods as for via the Gloss Edit.

I'm wondering if in GlossVideo, the "file" is an actual link to a file on disk. Would Django "know" if the file is merely overwritten? Can your tests figure this out? :)

susanodd commented 6 months ago

Okay, I see the code actually does use the add_video method. It's just really hard to follow the code because the same method is also used for images!

susanodd commented 6 months ago

@Jetske for some reason the gloss model method get_video_url is returning a path with a backup suffix in it.

What would be the reason for that?

This is after an add_video operation.

susanodd commented 6 months ago

It seems to have something to do with the permissions settings, that signbank isn't able to move the existing file, nor create a new file. Probably it needs to adjust the permissions for a gloss XX folder under the dataset.

Jetske commented 6 months ago

@susanodd Do you mean the random suffix _XXXXX or a .bak suffix? If it's the first, that is not a backup suffix but happens when trying to save a file with a name that already exists as a file in the folder. This shouldn't happen if the existing files are first converted into backup files (which is a step in add_video). Maybe something goes wrong there? Possibly some existing video's weren't saved correctly so it can't find them for backup (e.g. a file was added to a folder but not saved as GlossVideo object)? I'm just guessing here.

susanodd commented 6 months ago

Yes, the first kind of suffix, before the mp4. It's on my local server. It's some really old files. They are in GlossVideo but with that wrong suffix. I removed them from disc so that I could then remove them from GlossVideo. But it made more of the same kind. This was on Ubuntu, so it would also do that on the real server.

Jetske commented 6 months ago

The test videos pass the tests, even though they are not all converted correctly. I just can't get webm recorded in firefox to convert right. That's why it was disabled in firefox to record videos. For some reason the duration of the video seems to be messed up.

Also the file with codec "prores" (.mp4 that's actually a .mov/apple type?) doesn't work.

susanodd commented 6 months ago

I'll check those. It's weird they play on Ubuntu.

I need to bring some more Apple mov videos from my macbook, recorded with just the camera, then saved using different export settings in QuickTime. Those it just worked to "-copy" in ffmpeg. The umbrella video is really old from signbank. It may have been interlaced. That messes up the number of frames. Apple used to use 24 fps instead of 30. (Sorry to babble, there was once a difference before people started wanting 30fps for video games.) It's probably okay to have a slightly different frame rate. Some software used to drop frames because of the different frame rates. PAL was 25 but film was 24. We don't actually know how the old videos were made.

susanodd commented 6 months ago

I made GlossVideoHistory be included in the Video Admin. To my surprise, there are full paths stored.

I made it visible to check what the import code is doing when it uses the add_video method.

I revised to code to merely copy the file to the location instead. I'm going to update it so it does the backups if necessary and puts it in the history that it was via an import.

I kind of don't think these full paths should be stored there. Especially since signbank has moved to different servers many times. Maybe weird stuff happens with filenames because of too many versions of old videos.

susanodd commented 6 months ago

I decided to use shutils to put the imported video in the gloss. So by hand first revert the videos then copy, then if copy was unsuccessful, revert back. On success a new entry in GlossVideoHistory and a new GlossVideo, version 0. But I'm a bit stuck on the videofile field. The places it is called it is very obscure what it actually does. I made a File object from the new video and saved it that way. But obviously, there is no "uploaded_to" or "instance" information for it, because it does not come from a Form.

susanodd commented 6 months ago

@Jetske I still have trouble getting those weird suffix things before the .mp4.

Even with the system first using reversion on the already existing videos, it still makes a weird filename.

It also seems that the atomic does not necessarily work. I looked it up, for sqlite, the entire database is locked. But maybe that is separate from operations at the file system level.

I haven't figured out yet how to "not" get those weird filenames. It's just doing that.

I think that's why the original import videos just overwrote the videos and didn't bother with saving any old videos.

This only gets worse when there is a long list of video files to process. I put a timeout between them. But still, it seems it's choking on moving around old videos.

susanodd commented 6 months ago

Perhaps the path stored in GlossVideo ends up different than the one on disk the first time it makes one of those weird suffixes.

I'm kind of thinking we should ditch the stupid reversion stuff.

susanodd commented 6 months ago

@Jetske I made it so the import videos checks if there are any videos and just overwrites an existing file. Otherwise, it creates a new one. For creation, it makes a new GlossVideo entry. In both cases, it updates GlossVideoHistory.

It's too complicated and crash prone if the import needs to move a bunch of old (possibly non-existent) files referred to in GlossVideo. (There is a bunch of old non-existent stuff in there, and some weird pathnames, also ones with that extra suffix before the mp4, also lots of ".bak.bak.bak.bak...." stuff.

I think we need a "clean up" command. I revised the admin for videos so it's possible to inspect what's in GlossVideo and GlossVideoHistory. Also delete stuff from GlossVideo if there is no file. Except that is dangerous because of the version field.

The problem importing videos is that there exists lots of stuff in GlossVideo. Even broken links. So it's only safe to just replace the version 0 file. That works.

There are lots of signalled commands linked to updating glosses, etc. So doing the reversion also causes lots of other stuff to happen because it wants to physically move files behind the scenes. But the file system is not locked, just the database. We need to change to a different database platform to do it properly.

susanodd commented 6 months ago

@Jetske I may have accidentally discovered why some video things don't work. It seems it does not work to use some of the os.path.isdir(), os.mkdir(), and os.path.exists() as expected. They are crashing for some reason on signbank-dev.

After doing a mkdir in one function, another that tests os.path.isdir() OR os.path.exists() fails and it tries to make the same directory again, then complains that it already exists. This is totally unexpected. It might be because signbank on the real server uses Apache rather than runserver from the virtual server.

The things that go wrong with making the small videos are related to permissions. It also has something to do with those weird suffixes before the mp4. But now it's complaining about the folders. I first made the folders before running the code. (So the folder existed.) But the code testing whether it exists returns False, in spite of it existing. Then it tries to mkdir and that fails because it exists. Maybe this is some server setting? Or it's getting group or permissions mismatches. Signbank virtual environment is run as root. But there is also a user ubuntu on some things, plus the groups wwwsignbank and www-data. (I've tried all combinations on the TEMP folder it is complaining about making it whether it exists or does not exist, it tries to make it.) We had trouble with the groups last year when the videos could not be viewed.

susanodd commented 6 months ago

Maybe it's somehow interfering with Django "protected media" ?