acl-org / acl-anthology

Data and software for building the ACL Anthology.
https://aclanthology.org
Apache License 2.0
384 stars 256 forks source link

techtalk.tv videos not loading #167

Open mjpost opened 5 years ago

mjpost commented 5 years ago

It seems there are video links missing from the DB. For example the NAACL 2016 videos are not linked:

http://techtalks.tv/naacl2016/

I expect there are many more. Furthermore I am not sure there is a formal ingest process for them.

May be related to #64.

mjpost commented 5 years ago

Update: here are all the videos that *ACL hosts at techtalk.tv. I was not able to watch any of the handful of videos I tried.

Separately, not all of them have been ingested (denoted with ✅❌):

knmnyn commented 5 years ago

We need a mapping from the URLs at TechTalks to the ACL IDs in order to ingest them. This was the responsibility of Weyond the company that ACL used to record the videos. I had asked our contact their for the mapping (as is contractually obligated every year) but they didn’t provide them to us (I had reminded multiple times).

This is a good matter to take up with the Exec if they want to continue to contract them without finishing their obligations. Then the volunteer group will need to produce the mapping and insert the video URLs manually into the authoritative XML.

I can give you the contact in discussed these matters with offline.

mjpost commented 5 years ago

Thanks, Min. This is good to know. I am very happy to hear that we have a contract saying someone else has to do this work :) The exec is meeting on the 15th so I'll add this information to my report.

akoehn commented 5 years ago

FYI: There are lots of EMNLP videos on vimeo which are not ingested as well: https://vimeo.com/aclweb

2018 seems to be linked, the older ones are not afaict.

mjpost commented 5 years ago

Thanks for this info!

Do you have time to take a census? i.e., complete the table above so we have complete information on (a) what video sets exist and (b) which ones are not in the Anthology? I think techtalks.tv is complete but I may have missed something, and I didn't look at Vimeo.

akoehn commented 5 years ago

I might have time in a few weeks (or in the evenings) so if it is not urgent, I'm happy to do that.

akoehn commented 5 years ago

As I will have to figure that one out by matching talk titles (the conferences are not named) I might as well directly create the links in the XML source then.

mjpost commented 5 years ago

As Min mentioned, the video company is contractually obligated to provide these links, so don't spend your time doing it. I just want to have conference-level information (yes or no) so we know which ones they haven't provided yet.

akoehn commented 5 years ago

Ok, I didn't know that it was also for the vimeo videos. I assumed they were something different as they are not hosted on techtalks. In that case I will do it earlier.

knmnyn commented 5 years ago

Hi Arne, all:

Just FYI. The video company in question (Weyond) is responsible for compiling a mapping for the Anthology from the techtalks.tv (or wherever they decide to host the video) to Anthology identifier. Last time (2015 or so) that they actually sent me the data, they did it through an Excel file. I'm not sure who did the vimeo uploads and whether it was the same provider (different conferences do not actually have to invoke the same provider).

akoehn commented 5 years ago

Going through the videos was too tiring so I wrote a small matching script instead. Here is a CSV with the successfully matched videos with information about the videos and the corresponding conference & paper And here is one with the videos that could not be matched

Steps to reproduce: Download my python script to the bin directory, install pyvimeo, run the script, wait an hour or so, have your results in /tmp/ Edit²: python script included my api secret, removed that one. If you want to run it (no good reason to do it), you will have to get those from vimeo.

Notes:

akoehn commented 5 years ago

Overall, there are 701 videos I could match which don't have a video attribute in the XML data yet. These are from:

There were 64 videos I could not match (automatically).

I could extend the script to add the video links to the relevant XML if you like (with no promises about when I actually do it).

There are two videos from W 02 (sic!) which got a recent video due to a test of time award. Should they be linked as well?

mjpost commented 5 years ago

Yes, I think adding the test-of-time videos would be great. Can you post your match list somewhere, maybe as an attachment on this thread? It would be awesome to have the XML updated, and if the file were here, maybe someone else could get to it, too.

I have taken a survey of the current state of the videos here: ACL Anthology Video Survey.

akoehn commented 5 years ago

I updated the old list:

Here is a CSV with the successfully matched videos And here is one with the videos that could not be matched

The two W02 papers are not in there as I restricted matching to papers with year > 2013.

The files are also available here: vimeo_links.tar.gz

akoehn commented 5 years ago

As we still don't seem to get the video-paper mapping, I would suggest that they instead add the permanent anthology URL of the paper to the video's description. That would allow us to easily obtain the mapping via the vimeo api and People looking at the video could go to the paper without searching for it.

mjpost commented 4 years ago

I created a short guide stipulating contractual items that I think should be included when we contract with companies in the future. Comments welcome (especially @akoehn):

akoehn commented 4 years ago

I made minor adjustments without changing the content.

Regarding the content: In my opinion, the video title should contain the conference, e.g. "ACL 2019: General Adversarial Restricted Boltzmann Machines for Unsupervised Deep Augmented Training" It is otherwise really hard to find out in which conference a presentation was made.

mjpost commented 4 years ago

That's fine. It could also be put in the description.

manning commented 4 years ago

Even beyond the evil fact that techtalks.tv videos were only Flash videos, they now just seem to be disappearing. E.g., the ones from ACL 2015 just aren't available now:

[Error] Failed to load resource: A server with the specified hostname could not be found. (FB.Share, line 0)
> Selected Element
< <div id="collageview_videoplayer_div" class="collageview_videoplayer_div">
<a class="rtmp" href="ACL-IJCNLP-2015/mp4:27-310-04.mp4" style="display:block;width:640px;height:480px;" id="flowplayer_presenter">…</a>
</div>

Is it possible for ACL to maintain its own copy of the videos, in some reasonably future looking format like mp4's? Or just to put them on YouTube, which is probably as good as we can do for long term access in various formats?

mjpost commented 4 years ago

We do fortunately have .mp4 backups of ACL 2015 and a few other conferences (see the video survey Google doc linked above). Updating the Anthology to link to them is a task in our queue, but it's a big, onerous one that is probably going to require a paid contract.

We have moved to hosting everything on Vimeo. Perhaps Youtube would be better, but people have noted that it is not available in China. I want to also have a manual backup to some kind of cold storage, but that is also queued. At the moment we are trusting Vimeo's data retention policies.