ammesatyajit / VideoBERT

Using VideoBERT to tackle video prediction
116 stars 14 forks source link

Questions to understand this repo well :D #2

Open FormerAutumn opened 3 years ago

FormerAutumn commented 3 years ago

First, thank you for your great work :D

Question is like the title asked.

Using the centroids, videos are tokenized and text captions are punctuated. Using the timestamps for each caption, video ids are extracted and paired with the text captions in the training data file. Captions can be found here: https://www.rocq.inria.fr/cluster-willow/amiech/howto100m/.

In short, which file should I download if I want to match the 'Captions'(appears in the last sentence of quotes) ? I have downloaded the howto100mcaptions.zip(2.3G), is it correct ? I saw the files in it are all .csv file ;; What's more, Has anyone run the repo which the author said that he/she inspired by ? the repo is https://github.com/MDSKUL/MasterProject

ammesatyajit commented 3 years ago

Hi, so what I used was the raw_caption_superclean.json for the captions file, which you can download with the raw caption zip file I believe. I did run the other repo you mentioned, but I couldn't get any visual results, I only got the quantitative results the author did. Also, just a tip: try to increase the model size as much as possible and run it on a GPU for the best results.

Thanks for your interest!

FormerAutumn commented 3 years ago

@ammesatyajit Thank you for your reply !!! As some personal reasons, I wanna run the other repo(masterproject) more. And these days I always keep your repo(this and the hkmeans repo you owned) comparing with the masterproject repo. You really did a good job! If met problems later, may I bother you again ? (or maybe again and again :( )

ammesatyajit commented 3 years ago

@FormerAutumn Sure! Im happy to answer any questions you have.

FormerAutumn commented 3 years ago

@ammesatyajit Thanks for your kindness !

Do you know where to get the 'data/newest-data-max-len-20.npy' in https://github.com/MDSKUL/MasterProject/blob/master/stap5/globals.py ? (I scan all the urls the author mentioned and only find the centers.npy has the .npy suffix)

Can I get your email or other contact ways ? Or just make problems here is ok ? If you will provide your contact way, you can email to cnzsy98@163.com. Thank you so much :D

FormerAutumn commented 3 years ago

Hi, sorry to disturb you. In your inference.py, you define a function named 'text_next_tok_pred' . Does it take some video clips as input and the choose some center image according to output to make the whole sequence completely (I know that you just choose first 5 images to do visualization)? (correct me if I was wrong)

ammesatyajit commented 3 years ago

So for the text next token prediction, there is no video involved, and I am just using the model for next word prediction in a sentence (similar to GPT). This was useful as a sanity check to see if the model gained useful information which I later built on when I tested it on video (Note that I haven't added more inference functionality yet and it is relatively simple to do so). Hope that answers your question

ammesatyajit commented 3 years ago

@ammesatyajit Thanks for your kindness !

Do you know where to get the 'data/newest-data-max-len-20.npy' in https://github.com/MDSKUL/MasterProject/blob/master/stap5/globals.py ? (I scan all the urls the author mentioned and only find the centers.npy has the .npy suffix)

Can I get your email or other contact ways ? Or just make problems here is ok ? If you will provide your contact way, you can email to cnzsy98@163.com. Thank you so much :D

@FormerAutumn Sorry for not replying earlier. I believe the author links the google drive file that you ask for. You can definitely contact me by email, my email is ammesatyajit@gmail.com.

FormerAutumn commented 3 years ago

Thank you for your consistent reples :D I'm now try to understanding your trainning pipeline as I found that your code seems more clear to me compared to the masterproject (the other repo). I'm so sorry that I wrote the wrong name, the function actually I wanna ask is 'video_next_tok_pred'...... In short I wanna ask is that, does it get in a batch of video clips(and tok_embed embed them use the cluster center('s embedding) which them belong to?) and output logits which you use to choose a corresponding clsuter center ? If so, why not just use the video's cluster centers as seq(the model input) ?

I transfer this method to implement a idea occurs to me(doesn't work now, I might show it github in the future if possible)

ammesatyajit commented 3 years ago

So video_next_tok_pred takes in the tokens from the validation set. It doesn't take in video clips. Hope that answers your question.

joaanna commented 3 years ago

hey, great work! I am also trying to understand your code better. In Video Bert and also the parameters used here you take 4 HIERARCHIES and 12 clusters. The paper says that yields 12*4 = 20736 clusters, but in this code in README you mention concatenating the centroids, and then the label_data labels features by the closest centroid. Wouldn't that yield 124 clusters, effectively 12*4 video tokens? How does it become 20736 clusters?

ammesatyajit commented 3 years ago

Hi, sorry if the readme was slightly confusing. The 20736 centroids were stored in separate files due to the hierarchical k-means. The only purpose of concatenating them was so I could access all of the centroids with one file. the label data takes in the video feature vectors and finds the closest of these 20736 centroids to effectively tokenized each video. Hope that clears up any confusion.

joaanna commented 3 years ago

That makes sense, thank you!

Another question, I am able to run the clustering with this command: python3 -m hkmeans_minibatch -r features -p ft_hp -b 40 -s vecs_dir -c centroid_dir -hr 2 -k 12 -e 1 which yield 12 clusters each of shape 12, feature dimension. But when tuning the k and hr parameters I run into different issues: for : python3 -m hkmeans_minibatch -r features -p ft_hp -b 40 -s vecs_dir2r -c centroid_dir2 -hr 3 -k 15 -e 1 I get this error: Traceback (most recent call last): File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/hkmeans_minibatch/__main__.py", line 39, in <module> main() File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/hkmeans_minibatch/__main__.py", line 35, in main hkmeans(root, prefix, h, k, batch_size, epochs, save_dir, 'vecs', centroid_dir) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/hkmeans_minibatch/hkmeans.py", line 99, in hkmeans hkmeans_recursive(root, prefix, h, k, batch_size, epochs, save_dir, save_prefix, centroid_dir) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/hkmeans_minibatch/hkmeans.py", line 91, in hkmeans_recursive save_prefix.format(i), centroid_dir, cur_h=cur_h + 1) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/hkmeans_minibatch/hkmeans.py", line 91, in hkmeans_recursive save_prefix.format(i), centroid_dir, cur_h=cur_h + 1) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/hkmeans_minibatch/hkmeans.py", line 94, in hkmeans_recursive centroids, labelled_data = minibatch_kmeans(root, prefix, k, batch_size, epochs) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/hkmeans_minibatch/hkmeans.py", line 40, in minibatch_kmeans labelled_data[path] = list(kmeans.predict(np.load(path))) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/sklearn/cluster/_kmeans.py", line 1913, in predict check_is_fitted(self) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/sklearn/utils/validation.py", line 72, in inner_f return f(**kwargs) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/sklearn/utils/validation.py", line 1019, in check_is_fitted raise NotFittedError(msg % {'name': type(estimator).__name__}) sklearn.exceptions.NotFittedError: This MiniBatchKMeans instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

for python3 -m hkmeans_minibatch -r features -p ft_hp -b 60 -s vecs_dir2r -c centroid_dir2 -hr 3 -k 15 -e 1 Traceback (most recent call last): File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/hkmeans_minibatch/__main__.py", line 39, in <module> main() File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/hkmeans_minibatch/__main__.py", line 35, in main hkmeans(root, prefix, h, k, batch_size, epochs, save_dir, 'vecs', centroid_dir) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/hkmeans_minibatch/hkmeans.py", line 99, in hkmeans hkmeans_recursive(root, prefix, h, k, batch_size, epochs, save_dir, save_prefix, centroid_dir) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/hkmeans_minibatch/hkmeans.py", line 91, in hkmeans_recursive save_prefix.format(i), centroid_dir, cur_h=cur_h + 1) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/hkmeans_minibatch/hkmeans.py", line 87, in hkmeans_recursive save_sorted_vectors(centroids, labelled_data, batch_size, save_dir, save_prefix) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/hkmeans_minibatch/hkmeans.py", line 56, in save_sorted_vectors sorted_vecs.append(np.expand_dims(vectors[j], axis=0))

Should the hr and k be in some relation to the batch size?

FormerAutumn commented 3 years ago

@ammesatyajit sorry for my late reply. Thank you for your kindness, I reeeeeeee-read the VideoBERT and found that seems ViT model is more similar to what I wanna implement, so I turn to ViT. :D

ammesatyajit commented 3 years ago

@joaanna Sorry for not replying earlier. I am not going to be able to provide a detailed response because I am a little busy at the moment due to personal reasons, but if you want to, you can read the code/the docs for my hkmeans code: https://github.com/ammesatyajit/hierarchical-minibatch-kmeans. I will try to reproduce your error as soon as possible and get back to you on what the problem is. Also, could I ask for you to tell me the dimensions of your input data files? The batch size should ideally be larger than the number of vectors in each input file. For example, I used a batch size of 500 when I did hkmeans on files with 20 vectors each.

ammesatyajit commented 3 years ago

@FormerAutumn no problem. Vision transformer is really interesting, hope you find what you are looking for :)

harshraj32 commented 3 years ago

@joaanna can u share the data u downloaded the site is down or something i am unable to download the cooking videos data