Closed tejasrjain closed 9 months ago
Hi, Thank you for your interest in our work. Please refer to the paper for the exact details. I will release the entire pipeline by the end of this month.
Hello, Thanks for your reply and the great work you did. I tried to use the details as specified in the paper. I couldn't figure how to split the videos in 10-15 seconds as we also need to map that with ocr text as well. So how you make sure that the frame we have ocr text is also included in the frames we extracted the video features.
Hi. Sorry, for the late response. I have updated the readme with instructions on how to train and evaluate the model. Please check it out. You can split any lecture of your choice into 10s-15s clips using ffmpeg and follow the instructions in readme to extract the features. Then, bind all the features into a single pickle file (similar to dataset_v1_helper.pkl which I will upload shortly. I am facing some issues while uploading large files). I am attaching a screenshot to help visualize the data format. We obtain OCR for the last frame of a 10-15s clip.
Hi~ Thanks for the update, I'd like to know your specific process for dividing the video into 10-15 seconds. According to my understanding, each video has a topic boundary label first, and then the video is divided into 10-15 second segments. In this process, it is necessary to ensure that the topic boundary is at the end of a certain segment to be classified. If so, will it involve label leakage?
Hi, we divide a single lecture video into 10s-15s clips. This is independent of the boundary labels (we are doing segmentation in an unsupervised manner and do not rely on ground truth labels). 10s-15s clip is the atomic unit that we operate on. We just divide a video into N segments of 10s-15s each using ffmpeg. It is true that there can be some boundaries that can occur in between 10s-15s clips. This is captured by the BS@K metric. Hence, we have also experimented with smaller duration clips where the chances of boundary occurrence are less, i.e., with 4s-8s clips. Please check the supplementary section of our paper for more details.
Hi. I am closing this issue. All the files are uploaded. In case you have any further doubts/queries please raise a new issue. Thanks!
I am trying to reproduce the results. But can you please let me know how should the input be provided?