Signbank / Global-signbank

An online sign dictionary and sign database management system for research purposes. Developed originally by Steve Cassidy/ This repo is a fork for the Dutch version, previously called 'NGT-Signbank'.
http://signbank.cls.ru.nl
BSD 3-Clause "New" or "Revised" License
19 stars 12 forks source link

Missing still picture from video on irregular basis #1346

Open rem0g opened 2 weeks ago

rem0g commented 2 weeks ago

I am getting reports from colleagues there is sometimes picture missing from video still, for example:

https://signbank.cls.ru.nl/dictionary/gloss/49457/

susanodd commented 2 weeks ago

I noticed this error too, but I can't fix it. Sometimes the script that generates them does not work. I suspect it has to do with permissions on the file system. Another possibility is that the video coding is not as expected by the code that generates the image. This bug has existed for a long time.

vanlummelhuizen commented 2 weeks ago

I am getting reports from colleagues there is sometimes picture missing from video still, for example:

https://signbank.cls.ru.nl/dictionary/gloss/49457/

In the logs I found

Video file: /var/www/writable/FIETSEN-B-49457.mp4
ffmpeg -v quiet -i /var/www/writable/FIETSEN-B-49457.mp4 -filter:v scale='iw*max(1,sar)':'ih*max(1,1/sar)' /var/www/writable/tmp/signbank-ExtractMiddleFrame/FIETSEN-B-49457.mp4-frames/all/frame-%5d.png

There is no error. But I noticed that the path of the video file does not include 'glossvideo' and dataset acronym. Perhaps this is a problem. I will investigate further.

@rem0g how was this video uploaded: via the website of using the API with zip file? In the meantime, you can extract a still image from the video by clicking Create under Create Citation Form Image from Current Video in edit mode.

susanodd commented 2 weeks ago

I made this (hidden) url to show the video files that do not have a GlossVideo object. (There was already a command to do this that was hidden. I merely modified the code to look at the files.)

(It assumes the correct video path where it looks.)

https://signbank.cls.ru.nl/dictionary/missing_video_view/

vanlummelhuizen commented 2 weeks ago

I made this (hidden) url to show the video files that do not have a GlossVideo object. (There was already a command to do this that was hidden. I merely modified the code to look at the files.)

(It assumes the correct video path where it looks.)

https://signbank.cls.ru.nl/dictionary/missing_video_view/

This does not seem to be relevant. Gloss 49457 does have a GlossVideo. In a Django shell on the live server:

>>> GlossVideo.objects.filter(gloss_id=49457)
<QuerySet [<GlossVideo: glossvideo/NGT/FI/FIETSEN-B-49457.mp4>]>

This gloss does not have or did not get a still image.

Furthermore, I tried out uploading a video via the website both on my local instance and on the live server. Both seem to work fine.

I could not get uploading a zip file via the website to work. So there may lie a problem. @susanodd , you implemented a lot of the zip upload functionality. Could you look into this problem in the context of uploading a zip file, both via the website and using the API?

susanodd commented 2 weeks ago

[SKIP THIS OR JUST BROWSE]

I repaired the code several weeks ago because some videos were being uploaded directly to glossvideo instead of glossvideo/NGT/XX

I removed files that were in the wrong place because they were inaccessible. That was several weeks ago. I removed the GlossVideo objects that had wrong relative paths.

If the GlossVideo refers to it then it should be correct. (Tegenwoordig)

I had some local problems with path names where the videofile.path was not defined and PyCharm wanted this to be videofile.name instead. (PyCharm refuses to run the server if it finds type errors etc. It does not like videofile.path.) The name is the relative path.

I am aware of what you are writing about the paths. There is also something weird with protected_media. I think there is some inconsistency between the Apache server with a file server versus the local runserver without a file server. (I suspect that PyCharm does something with this.)

The zip upload is a zip ARCHIVE, not a single file.

I am in the process of revising the zip upload functionality here #1344 There are some permissions bugs on the Apache server that do not happen on the local PyCharm server. (Described in the -- draft -- pull request.)

susanodd commented 2 weeks ago

[CODE DETAILS GLOSSVIDEO PATHS] Some places in the (original) code use a string coercion on a GlossVideo file object to retrieve the path/name/what? (what does the string coercion retrieve?)

I will look for them. (As above, PyCharm complains about using path or url. So probably that is why code is using a string coercion. Once upon a time, there was code that was only using the file system, not the GlossVideo objects.) Some code also hard codes "protected_media" in the path. (I think the "package json" does this too, since we can't put "writable" in a visible link. )

I write this remembering code I have read, I will locate the code to refer to it. [DONE]

https://github.com/Signbank/Global-signbank/blob/a9a4110ebbff99a3fed4d6db65caeadc9919c8b1/signbank/video/models.py#L329

https://github.com/Signbank/Global-signbank/blob/a9a4110ebbff99a3fed4d6db65caeadc9919c8b1/signbank/video/models.py#L407-L410

https://github.com/Signbank/Global-signbank/blob/a9a4110ebbff99a3fed4d6db65caeadc9919c8b1/signbank/video/models.py#L461-L466

https://github.com/Signbank/Global-signbank/blob/a9a4110ebbff99a3fed4d6db65caeadc9919c8b1/signbank/video/models.py#L644-L649

https://github.com/Signbank/Global-signbank/blob/a9a4110ebbff99a3fed4d6db65caeadc9919c8b1/signbank/video/models.py#L819-L825

https://github.com/Signbank/Global-signbank/blob/a9a4110ebbff99a3fed4d6db65caeadc9919c8b1/signbank/video/admin.py#L104

(If you look at the Video Admin, you will see that the string coercion yeilds the relative path, not the file name.)

https://github.com/Signbank/Global-signbank/blob/a9a4110ebbff99a3fed4d6db65caeadc9919c8b1/signbank/dictionary/models.py#L1503

susanodd commented 2 weeks ago

[API ZIP ARCHIVE] [ONLY RELEVANT IF YOU ARE INTERESTED IN DISCUSSION AND HYPOTHESIS ABOUT CODE DEBUGGING]

Probably the Squiggle API tester is not correct.

[OLD] This api for the zip archive was before the API token, so it relies on the user being "logged in". The zip file itself is in a url as a GET parameter.

The zip file upload is separate (a different API call) from the import to gloss of the unzipped video files. This makes it so the zip file upload has not done anything to the database. It only puts files on the file system in a separate location from that where the "real" videos are. This was deliberate to protect the actual signbank data.

As the code excerpts show, the string coercion of a video object yeilds a relative path, not the file name. So in the GlossVideo ADMIN, you can see the paths with the "writable" before them. The video/admin.py code makes use of string coercion and displays the "writable" part of the path at the front.

As above you can see the code in all those functions shown above is totally inconsistent.

I don't actually know why the string coercion yields the relative path. The __str__ methods yield the filenames. Maybe it's making use of the upload_to embedded during creation?

susanodd commented 2 weeks ago

I found another code that could affect the path of the GlossVideo:

https://github.com/Signbank/Global-signbank/blob/a9a4110ebbff99a3fed4d6db65caeadc9919c8b1/signbank/video/models.py#L165-L167

Is it possible that this could set the (relative) path to empty?

susanodd commented 2 weeks ago

The zip imported files use the correct path when the still image is being created. I just got the new version working and it logs the create image paths as you show above. Those are correct. (It's code that I did not change, so it was correct before.