Signbank / Global-signbank

An online sign dictionary and sign database management system for research purposes. Developed originally by Steve Cassidy/ This repo is a fork for the Dutch version, previously called 'NGT-Signbank'.
http://signbank.cls.ru.nl
BSD 3-Clause "New" or "Revised" License
19 stars 12 forks source link

Videos with bak bak paths #1373

Open susanodd opened 2 weeks ago

susanodd commented 2 weeks ago

I've implemented a "renaming" procedure that changes the wrong format to the correct format.

The new format leaves the "mp4" in the filename. So the old format files with 'bak bak" sequences are missing the video format. I assume it is always "mp4" since we used to use "ensure_mp4" on (Signbank uploads, not API). But we don't know this before because there is still code that mentions the (version * ".bak") suffix. (See #1374)

Incidentally, the "create poster image" does not work on videos that are NOT in "mp4", which is no longer checked because the API did not want that. So that could be why they are not being created sometimes, if the video is in the wrong format.

susanodd commented 2 weeks ago

@vanlummelhuizen about "reverse renaming the bak bak files". Do we just assume they are mp4 ?

vanlummelhuizen commented 1 week ago

@vanlummelhuizen about "reverse renaming the bak bak files". Do we just assume they are mp4 ?

There are files that are not MP4. The listing below shows the files in glossvideo that end in .bak and do not have the string 'MP4' in the file type.

root@signbank-new:/var/www/writable/glossvideo# find . -type f | grep -P '\.bak$' | xargs -i file {} | grep -v MP4 | less
./NGT/ON/ONE-AND-A-HALF-B-40012.bak17345.bak.bak: ISO Media, Apple iTunes Video (.M4V) Video
./NGT/ON/ONE-AND-A-HALF-B-40012.bak17346.bak: ISO Media, Apple iTunes Video (.M4V) Video
./NGT/ON/ONE-AND-A-HALF-B-40012.bak.bak.bak.bak.bak: ISO Media, Apple iTunes Video (.M4V) Video
./NGT/ON/ONE-AND-A-HALF-B-40012.bak17344.bak.bak.bak: ISO Media, Apple iTunes Video (.M4V) Video
./NGT/BL/BLIKJE-A-36667.bak.bak: ISO Media, Apple iTunes Video (.M4V) Video
./NGT/BA/BACTERIE-A-40006.bak13572.bak: ISO Media, Apple iTunes Video (.M4V) Video
./NGT/te/testlemmaidglosstranslation6-3729.bak.bak: ISO Media, Apple iTunes Video (.M4V) Video
./NGT/te/testlemmaidglosstranslation74-2793.bak.bak: ISO Media, Apple iTunes Video (.M4V) Video
./CSL_Shanghai/LA/LAUNDRY-MACHINE-A-6153.mp4.bak: ISO Media, Apple iTunes Video (.M4V) Video

However, when I search for them in the database, they don´t seem to belong to a GlossVideo object:

>>> files = [
... "glossvideo/NGT/ON/ONE-AND-A-HALF-B-40012.bak17345.bak.bak",
... "glossvideo/NGT/ON/ONE-AND-A-HALF-B-40012.bak17346.bak",
... "glossvideo/NGT/ON/ONE-AND-A-HALF-B-40012.bak.bak.bak.bak.bak",
... "glossvideo/NGT/ON/ONE-AND-A-HALF-B-40012.bak17344.bak.bak.bak",
... "glossvideo/NGT/BL/BLIKJE-A-36667.bak.bak",
... "glossvideo/NGT/BA/BACTERIE-A-40006.bak13572.bak",
... "glossvideo/NGT/te/testlemmaidglosstranslation6-3729.bak.bak",
... "glossvideo/NGT/te/testlemmaidglosstranslation74-2793.bak.bak",
... "glossvideo/CSL_Shanghai/LA/LAUNDRY-MACHINE-A-6153.mp4.bak"
... ]
>>> print(", ".join([str(GlossVideo.objects.filter(videofile=file).count()) for file in files]))
0, 0, 0, 0, 0, 0, 0, 0, 0

So, the current state is that all files in glossvideo for which an GlossVideo object exists are MP4. But I don't think it is guaranteed that it will always be that way.

susanodd commented 1 week ago

Whoa! It made some really weird file names there!

extrra bak baks after the new extension

There is video code that still uses "bak bak". But I thought it was being circumvented.

non-mp4

Okay, that is what I was afraid of. That some of the bak bak files might be totally different extensions.

I tried converting some off-line and that works. So probably a command is needed to check the format of the files and convert them if necessary.

It's possible that many of the backup files are the wrong format. That would be a normal reason for users to upload again.

vanlummelhuizen commented 1 week ago

I tried converting some off-line and that works. So probably a command is needed to check the format of the files and convert them if necessary.

Converting files currently in glossvideo? As said, all files that are nog MP4 don't have a corresponding GlossVideo object, so converting is not necessary.

vanlummelhuizen commented 1 week ago

@susanodd Why are there two very similar command script to rename backed up glossvideo files? :

And what does https://github.com/Signbank/Global-signbank/blob/master/signbank/dictionary/management/commands/rename_non_mp4_extensions.py do?

Are they tested, reviewed? Did you already use them on the server?

susanodd commented 1 week ago

@susanodd Why are there two very similar command script to rename backed up glossvideo files? :

And what does https://github.com/Signbank/Global-signbank/blob/master/signbank/dictionary/management/commands/rename_non_mp4_extensions.py do?

Are they tested, reviewed? Did you already use them on the server?

[THIS GOT A BIT LONG]

They are tested. But only locally. We don't have video files on the development servers.

The paths were going wrong. I did tests first to see what the "move" command would do. (The new format still has the "mp4" before the "bakNNN" so it actually has two extensions. I did not test it properly first and didn't notice it has the extra "mp4" inside the path. I wasn't sure if the "split" command on the path was getting the right parts. So then I changed it to construct what the path should be instead of manipulating the existing stored path. It seems it was sometimes not getting a relative path, but just the filename. That might work differently on the Apache server versus the PyCharm runserver, since files are not actually being served locally.) The rename for the extensions, that was only on a handful of files. I have the log script. Those need to be converted. It seems to be browser specific. The files actually display, if you type in the url for them using protected_media, even if the extension was changed. (At least on Apple and on Ubuntu.) The javascript code for drag and drop restricts the type of the video files, so they don't display in Gloss Detail. There were some problems before because the webcam format on Ubuntu/Apple/Dell (the computer from @Jetske) does not work on the "other" system. So some formats were excluded in video display. We had conversion for a while, but the API did not want that anymore. The "image" display was fixed. It was not including 'png' before, that's why the images were not showing. The files are all in the right place with the right name now. But the format needs to be converted on ones with non-mp4 format in an mp4 named file. Those weird files with extra bak sequences after the good bakNNNN extension need to be removed or renamed and objects created. (Removed you wrote.) None of the commands add "bak bak" to the videos. So those files already existed. The commands were only renaming the backup files. I'm working on the conversion part. That needs to be done with ffmpeg. I don't know why the original "ensure_mp4" stopped working. It was commented out. This was asked for for the API. Some of those videos also have the wrong format. Oh, I remembered. It did not work on ones that were webcam. It ended up that the "input" video ended up having a different frame rate than the "output" video so its length had been changed. Then it failed for some reason. (@Jetske will know the details.)
susanodd commented 1 week ago

@vanlummelhuizen all of the "renamed" non-mp4 files have been converted to real mp4 files (offline, using ffmpeg).

susanodd commented 3 days ago

TO DO: Convert format of non-mp4 files. Those that used to have "bak bak" sequences did not have any video extensions on them. Apparently it was assumed everything was converted using "ensure_mp4".