Open dreamgonfly opened 1 year ago
What model/tool do you use to extract speech transcripts?
@antoyang To extract speech transcripts, I used Whisper model (base). The ASR results seem okay, but I still cannot reproduce the results from the paper.
For YouCook2, the highest score I could get with the released fine-tuned checkpoint was CIDER 10.9 (47.1 from paper), METEOR 4.3 (9.3 from paper), and SODA_c 2.6 (7.9 from paper).
I0618 16:51:54.288763 139967128065856 trainer.py:915] Finished gathering eval metrics for 413 samples
I0618 16:51:54.290117 139862179014400 logging_writer.py:48] [0] validation/CIDER=0.109533, validation/F1_Score=0.159727, validation/METEOR=0.0432585, validation/Precision@0.3=0.368769, validation/Precision@0.5=0.182183, validation/Precision@0.7=0.063885, validation/Precision@0.9=0.00668596, validation/Precision_Mean=0.155381, validation/Recall@0.3=0.43129, validation/Recall@0.5=0.213194, validation/Recall@0.7=0.0730633, validation/Recall@0.9=0.0075032, validation/Recall_Mean=0.181263, validation/SODA_c=0.0268965, validation/n_preds=8.00726
Below are a few sample input csv data.
"video_id","duration","caption","start","end","asr_string","asr_start","asr_end"
"fn9anlEL4FI","490300000","['add garram masala seeds and a bay leaf to the oil', 'add the lamb to the pot', 'add garlic ginger paste and chopped onions to the pot', 'add chili tumeric coriander cumin and salt', 'add water to the pot', 'add potatos to the pot', 'add the tomatos to the pot', 'add chili to the pot']","[30000000, 69000000, 136000000, 170000000, 230000000, 309000000, 383000000, 438000000]","[39000000, 86000000, 149000000, 183000000, 238000000, 333000000, 390000000, 443000000]","[""Welcome back once again to how to cook great food.com. If you haven't already, click that"", 'button and subscribe to our channel. Only make it today, you can be making a lamb and potato', ""curry or masala. As you can see I've got my pan here and in there I've got some oil"", ""that's heating up nicely. I'm using a sunflower oil, go ahead and use any oil you like."", ""We're going to drop in some whole seeds or garam masala. So here they go. We want them"", ""to roast on pop and crackle. There's a bay leaf here. I've got in there some fennel seeds,"", ""cumin seeds, green cardamom, and black mustard seeds. That's what I'm using today for this."", ""They're going to release a wonderful flavour into that oil. Now we're going to go in with our lamb."", ""We're going to fry this for about five or six minutes just with the whole garam masala."", 'Here we go. This lamb has got burning. You can use chilli if you want.', ""So let's just cook this. Let's say it's got about five or six minutes."", ""Stir it over. I'm going to kind of above medium heat. We'll just see it a little bit."", ""Then we're going to add lots of other lovely spices."", 'You can see that our meat is browning really nicely. I mean it is no any accrued.', ""That's what we've got to do now is to get this meat nice and tender."", ""What I do here is I'm going to add some garlic ginger paste. That's a 50-50 mix of garlic and ginger."", ""It's about three of these little teaspoons in there. I'm going to add some chopped onions."", ""I'm using a red onion but go ahead and use white."", ""Then we're going to add some powders. As always if you've watched the channel I call these the big four there equal parts of chilli, coriander, cumin and turmeric."", ""If you'd like of course you can use your favourite curry powder. We're going to add some salt at this stage."", ""Let's flip this over."", ""We're going to cook this for about now three or four minutes. Turn it constantly."", 'Again on a kind of above medium heat.', ""We've got some lovely flavours happening now."", ""Now we're going to add some water."", 'That was cold water by the way.', ""We're just covering it a little bit."", ""We're going to bring this water to the boil and then we're going to simmer this with a lid on."", 'For about 15 minutes this is the part that I hope generally works.', ""We'll tend to as I meet them make it nice and soft."", ""So let's take a look now."", 'Look at that steam out of there.', 'This is cooking down beautifully. As you can see look at that.', 'The needs come straight up of that bone.', ""It's certainly on its way now."", 'A pretty essential part of doing this dish is to get your meat nice and tender.', ""You're getting that with an awful tough meat."", 'Now made of what cut you use.', ""Maybe you put really expensive lamb but it would still end up being tough if you don't go for this process."", ""I'm now going to add some potatoes."", ""We've tough peeled and chopped."", 'These are fairly small.', 'You cut them however you like and the cooking process will obviously take a longer time if you put them in as much bigger.', ""So again let's give this a mix."", 'Stir them in.', ""We've still got a decent amount of moisture in there from that water."", ""If you haven't at this point maybe you've got to really dry."", 'Add a bit more water now.', ""It's going to go back on."", ""I'm going to cook this for about 78 minutes on a fairly low heat."", 'Not a simmer, above a simmer.', ""Okay let's jump in now and take a look."", ""Let's look in more like it."", 'The potatoes are cooking very nicely.', 'I kind of like my potatoes quite soft.', ""I'm now at this stage going to add some chopped tomatoes."", ""I'm just going to spread them on the top."", ""I'll put the lid back on."", ""On a fairly low heat we're going to cook them just for about five minutes."", 'What they should do is break down with the steam.', ""Don't stir them at the moment."", 'The steam will break them down.', ""We're going to mix it around once and come back."", 'We may add a little tad more water perhaps.', ""And then we're pretty much done."", 'We should be now at the final stage.', 'Yeah these are soft and really nice the as you can see.', ""And they've given off a little bit of moisture as well."", 'Just now turning it over.', ""At this stage I'm going to add some fresh chilli."", ""It's totally optional as to how much you're putting."", ""I'm putting about four or five there."", 'You now need to check this for salt.', ""It's all good for me."", 'You can if you want finish that off with some fresh coriander or cilantro.', ""Let's just cook that for about two more minutes and it's done."", ""It's wonderful."", ""I'm really happy with it."", ""I'll see you again soon."", 'Take care.', 'Thank you.']","[0, 9800000, 14840000, 22140000, 26800000, 33800000, 41800000, 57300000, 61800000, 68800000, 86800000, 101800000, 109800000, 120800000, 127800000, 132800000, 139800000, 146800000, 161800000, 173800000, 185800000, 196800000, 205800000, 219800000, 226800000, 235800000, 241800000, 248800000, 254800000, 259800000, 266800000, 269800000, 273800000, 278800000, 282800000, 287800000, 292800000, 296800000, 298800000, 305800000, 309800000, 314800000, 318800000, 333800000, 337800000, 346800000, 350800000, 353800000, 358800000, 361800000, 366800000, 368800000, 372800000, 375800000, 378800000, 382800000, 388800000, 391800000, 393800000, 396800000, 399800000, 401800000, 403800000, 405800000, 408800000, 411800000, 415800000, 418800000, 423800000, 435800000, 438800000, 441800000, 446800000, 453800000, 456800000, 461800000, 464800000, 465800000, 467800000, 468800000, 469800000]","[9800000, 14840000, 22140000, 26800000, 33800000, 41800000, 48800000, 61800000, 68800000, 76800000, 101800000, 108800000, 115800000, 127800000, 131800000, 139800000, 146800000, 152800000, 173800000, 185800000, 192800000, 204800000, 211800000, 223800000, 231800000, 241800000, 246800000, 254800000, 259800000, 266800000, 269800000, 273800000, 278800000, 282800000, 286800000, 292800000, 296800000, 298800000, 305800000, 309800000, 313800000, 318800000, 333800000, 337800000, 340800000, 350800000, 353800000, 358800000, 361800000, 366800000, 368800000, 372800000, 375800000, 378800000, 382800000, 387800000, 391800000, 393800000, 396800000, 399800000, 401800000, 403800000, 405800000, 408800000, 411800000, 414800000, 418800000, 423800000, 426800000, 438800000, 441800000, 446800000, 450800000, 456800000, 461800000, 464800000, 465800000, 467800000, 468800000, 469800000, 471800000]"
"-dh_uGahzYo","561490000","['mix hanger chili powder ginger powder fennel powder and water', 'add cumin seeds green cardamom cinnamon sticks to a blender', 'heat some ghee in a pan', 'add the black cardamom to the pan', 'add the mutton to the pan', 'add the mixture', 'season with salt and cover the pot', 'add the blended spice to the pot', 'cover the pot']","[105000000, 125000000, 138000000, 146000000, 183000000, 224000000, 247000000, 334000000, 381000000]","[120000000, 132000000, 145000000, 148000000, 196000000, 230000000, 259000000, 345000000, 383000000]","['Hello, Namaste, Salamwalekum sastriya kal.', 'Welcome back to another session with your watch of at warawa.com.', 'Today I am going to show you another favorite of mine.', 'I am very surprised while I was checking the list of the dishes I did.', 'I did not make Mutton Rogen Josh.', 'Dear friends, this is one of the tastiest and super awesome dish from Kashmir.', 'You know, this is the dish what I learned from the master chefs only in five style hotels.', 'But I have seen in lot of restaurant they serve Mutton Rogen Josh.', 'They just serve the Mutton Curry and call it Rogen Josh and he does not have the punch', 'what Rogen Josh must have.', 'You know, a lot of people add onion, tomato and all this in making me Rogen Josh.', 'But what I am going to do today, I am not going to add onion or tomatoes nor even yogurt.', 'You know, if you want you can add a little bit of yogurt but I am not even going to add yogurt.', 'So for this the spice is what we are going to add is Javitri that is Mace, Cinnamon, Green Cardamom,', 'Cumin seeds, Black Cardamom and Saferan.', 'You know, I am going to make a powder of these four and add while as I black illaji or the black cardamom', 'and cook the meat with it.', 'Now here I have got one end of table spoon of chilli powder.', 'Not any chilli powder, Kashmiri chilli powder.', 'That is what will give nice red colour, ginger powder but one table spoon of final seed powder.', 'And we are going to add this in this quantity and that will give a very nice tasteful gravy.', 'Now to make it very simple I am going to mix all of these masalas together so you will understand.', 'So here I have got hing powder.', 'You know, hing is a must for Mutton Rogen Josh and in this add Kashmiri chilli powder.', 'Ginger powder and final seed powder.', 'And in this add water and mix this into a watery paste.', 'And now we are going to add in a blender.', 'I am going to add the cumin seeds, green cardamom, cinnamon sticks and Javitri that is Mace.', 'I am going to powder and add it.', 'So make it in a nice coarse powder.', 'You know, you are not going to cook it in the oil.', 'This Mutton Rogen Josh needs to be cooked in nice desi ghee.', 'When this desi ghee heats up we are going to add badi illachi that is black cardamom.', 'That will give a nice flavour to this dish.', 'And here I have got meat.', 'This is a nice lamb meat and all these meats have bone.', 'Nally that is the shanks of meat.', 'And take all the pieces which are like shanks.', 'And when these get cooked like this with the bone in, the gravy becomes nice and very', 'flavourful.', 'Now here the ghee is heated up and my black cardamom is nicely roasted in this add pieces', 'of this meat.', 'And we are going to cook this meat in this ghee.', 'And you have to cook in the meat becomes slightly brown.', 'That is when you get a very good flavour to the gravy.', 'You know, it is better always to fry the meat like this and then cook it on a slow', 'flame.', 'You know, now look at this meat.', 'This is nicely slightly brown and you know, this method of cooking is used not only in', 'India but throughout the world.', 'When you roast the meat like this, it is called Milad effect.', 'What it does is it caramelizes the outer coating of meat and gives a very nice flavour to', 'this dish.', 'Now this is all ready.', 'Now in this, you are going to add the mixture, the paste of the chili powder and soft', 'powder, fennel powder into this.', 'And you can also add little of saffron.', 'You know, this will also give a very nice flavour to this dish and pour in a lot of water', 'to cover the meat.', 'You know, because I wanted to show you, I am cooking in such a big pan.', 'Otherwise I would have taken a little smaller pan like this but you know, to make sure that', 'you see what is happening in the pan I took a vessel like this.', 'Now put the lid on and cook it on a slow flame for at least one hour to one and a half', 'hour.', 'Another easy method of, if you do not have patience to spend one and a half hour of slow cooking,', 'easy method is just pour this into a pressure cooker, cook it and again transfer it back', 'in this pan because you want this masala also to be cooked.', 'In a slow method of cooking like this, what it does is it evaporates the water because', 'we added little extra water in this, that water will be overrated and when the sauce is done,', 'it has to be liquidy but all the masala needs to be cooked.', 'And here is the masala of cumin, cardamom, cinnamon sticks.', 'And when you are cooking in an open method like this, when this is cooked for like half', 'of the time that is almost 45 minutes, then we are going to add this.', 'But if you do in the pressure cooker, you will have to add it after the meat is cooked.', 'After cooking for almost 45 minutes, now look at this gravy, this is nice, the oil is', 'also slightly floating on top and you can see this meat.', 'The lamb bones were not visible when we started but look at this.', 'Now after 45 minutes, they coming off the bone, that is when you know that the meat is', 'getting nicely cooked.', 'Now here is the masala powder of a maze that is Javitri, cinnamon, cardamom, cumin and', 'all this and then we are going to add to this.', 'This is what will give a nice flavor to your Rogen Josh.', 'Just add all of this, mix it and we are going to cook this for another 30 minutes at least.', 'Till the time the meat is become nice and tender, the meat should be so much cooked that', 'it should be coming off the bone and also when it is properly done, this meat will literally', 'melt in your mouth, that is when you got a perfect and a super awesome tasty Rogen Josh.', ""So dear friends, you don't need to add curd, no tomatoes, no onions."", 'Just with this masala, you will not believe how much awesome flavor this is already giving.', 'So let me put the lid on and if you need to add little water, you can keep adding little', 'water till you get the desired consistency.', 'After cooking it for almost another 30 minutes, the flavor of Rogen Josh has spread all', 'over and you can see how the Rogen means, this oil that is floating, red in color, look', 'at this.', 'That is what makes this awesome dish super to look at and tasty also and wow, you know', 'if you make it right, this will taste super fantastic.', 'It is so good.', 'Trust me, make it the way I have shown you and it will be super fantastic.', 'Dear friends, this is something magical, this is something super awesome.', 'But you use nice lamp shanks to make it and take it easy on the ginger powder and you', 'will get nice perfect Rogen Josh.', 'You know while we were in the college, they always used to add Ratanjog.', 'But when I was in the industry, they told no, no, no Ratanjog in this, the color should', 'come from the Kashmiri chillies, the flavor, the aroma should come from saffron and some', 'people also add coxcom is a kind of a flower which gives also a nice coloring.', 'Some people add that but for me, this is super perfect.', 'I am going to switch off the flame and I am going to enjoy it hot along with my non-vav.', 'Now look at this Rogen Josh.', 'Wow, what flavors along with this?', 'So much perfectly cooked and you know, especially when I cook meat like this, I want the meat', 'to be fully in my s and tanda but still retains some of the pink color.', 'As a reason why I did not add turmeric but some people add if you want, you can add', 'turmeric also but dear friends, wow.', 'You know, eat with basmati rice or nice mughalai, non like the one what I am eating.', 'This is super and the nice sauce is also nice sticky and super tasty.', ""Dear friends, I hope you enjoyed today's session of learning how to make this awesome"", 'Mutton Rogen Josh from Kashmir but do not forget, Vahrehvah is all about inspiring', 'others to cook.', 'So please post your recipes and cooking tips at vahrehvah.com.', 'So others can benefit from your great cooking.', 'Thank you.']","[0, 10840000, 14560000, 17440000, 20840000, 24000000, 29720000, 34519999, 38360000, 43120000, 45200000, 50200000, 56560000, 60400000, 67640000, 71040000, 78120000, 79720000, 84360000, 87360000, 93880000, 98800000, 104600000, 107000000, 114000000, 119160000, 124600000, 127560000, 134240000, 136200000, 139720000, 141920000, 145880000, 150760000, 153600000, 156800000, 161760000, 164640000, 167200000, 172920000, 173920000, 183200000, 184200000, 188320000, 192840000, 196760000, 201300000, 202300000, 204720000, 210840000, 212640000, 216520000, 221440000, 222440000, 224240000, 231680000, 234520000, 238280000, 246280000, 247280000, 252000000, 256839999, 259880000, 265399999, 266399999, 271240000, 276480000, 281080000, 285080000, 290200000, 293520000, 298280000, 302760000, 306320000, 312320000, 318640000, 322880000, 326640000, 331039999, 332560000, 340320000, 343520000, 347560000, 354120000, 359479999, 364799999, 371760000, 376640000, 383000000, 387600000, 390760000, 396800000, 405120000, 406280000, 416480000, 425760000, 426760000, 434760000, 444120000, 450560000, 454000000, 458600000, 464400000, 469760000, 475960000, 480320000, 487320000, 492760000, 503240000, 509240000, 514840000, 519480000, 522880000, 531160000, 538320000, 542840000, 547520000, 548520000, 551760000, 554000000]","[10840000, 14560000, 17440000, 20840000, 24000000, 29720000, 34519999, 38360000, 43120000, 45200000, 50200000, 56560000, 60400000, 67640000, 71040000, 78120000, 79720000, 84240000, 87360000, 93880000, 98800000, 104600000, 107000000, 114000000, 119160000, 124600000, 127560000, 134240000, 136200000, 139000000, 141920000, 145880000, 150760000, 153600000, 156800000, 161760000, 164640000, 167200000, 172920000, 173920000, 183200000, 184200000, 188320000, 192839999, 196760000, 201300000, 202300000, 204720000, 210840000, 212640000, 216520000, 221440000, 222440000, 224240000, 231680000, 234520000, 238280000, 246280000, 247280000, 252000000, 256839999, 259880000, 265399999, 266399999, 271240000, 276480000, 281080000, 285080000, 290200000, 293520000, 298280000, 302760000, 306320000, 312320000, 318640000, 322880000, 326640000, 331039999, 332560000, 340320000, 343520000, 347560000, 354120000, 359400000, 364799999, 371760000, 376640000, 383000000, 387599999, 390760000, 396800000, 405120000, 406280000, 416480000, 425760000, 426760000, 434760000, 444120000, 450560000, 454000000, 458600000, 464400000, 469680000, 475960000, 480320000, 487320000, 492760000, 503240000, 509240000, 514840000, 519480000, 522880000, 531160000, 538320000, 542840000, 547520000, 548520000, 551760000, 554000000, 554360000]"
"BktdaTg6_E4","371900000","['mix vegetable oil salt and curry masala', 'marinate the lamb in a ziplock bag', 'season the lamb meat with salt', 'bake the lamb meat in an oven', 'blend garlic ginger cherry and onion and water', 'heat some clarified butter in a pan', 'add chopped onion and salt and saute', 'mix some cumin cinnamon black pepper and paprika', 'add the mixed spices the mixture and the lamb in']","[30000000, 62000000, 88000000, 91000000, 99000000, 123000000, 134000000, 156000000, 183000000]","[57000000, 75000000, 90000000, 98000000, 118000000, 133000000, 155000000, 172000000, 252000000]","['Hello, this is Chef John from Foodwishes.com with Lamb Shank Vindaloo.', ""That's right, I get a lot of complaints."", 'How come you never do Indian food?', ""It's because I'm scared."", ""I don't have a lot of experience with it."", 'I love to eat it.', 'But I thought I would give this one of my favorites to try.', 'This very spicy lamb type curry dish.', 'So I hope I got it close.', 'You Indian cuisine experts will be the judge.', 'So here we go.', ""So step one here, I'm going to put four lamb shanks in a plastic bag."", 'You need to get marinated overnight before we start the dish.', ""So I'm going to place those in."", ""And then into a bowl, I'm going to pour some cider vinegar, some vegetable oil, some salt,"", 'and then something called tamarind.', ""I'm using a tamarind concentrate."", ""And we'll talk a little bit about that on the blog."", ""But it's a very tart, sour kind of citrus-like ingredient."", 'All right, I started mixing that up and then I realized I never put the garmasala in,', 'which is a blend of Indian spices.', ""We've used that before."", 'We like it.', ""All right, so I'm going to mix that in and that's basically the marinade."", ""So we're going to pour that over the lamb shanks."", ""We're going to seal up that bag really well."", 'All right, just to confuse you, I put mine in a second bag as I thought I had a leak.', ""We're going to squeeze out as much air as possible so the meat is immersed in the marinade."", ""And then we're going to put that in the fridge overnight."", 'Not a bad idea to turn it over once in a while.', ""All right, the next day I'm going to pull it out of the bag."", ""I'm going to place it on an oiled foil lined sheet pan."", ""Don't throw away the marinade, by the way."", ""That's going in the stew later."", 'So just reserve the marinade.', ""I'm going to salt those generously on both sides."", ""And we're going to brown those in a very hot oven for 50, for 15 or 20 minutes until"", ""they're nice and brown."", ""We're going to pull those out and reserve them till needed."", ""Next up in a blender, we're going to add a lot of garlic, a lot of ginger, some cherry"", 'tomatoes, a nice big onion, and a little bit of water.', ""We're going to pulse that on and off until we have a nice smooth puree."", 'And it kind of looks like a delicious strawberry smoothie.', ""And yet it's so the opposite of that."", 'So just set that aside.', ""And it's back over to this stove where we're going to start the actual vindaloo."", ""So we're going to put a heavy Dutch oven on medium high heat."", ""And I'm going to put in some clarified butter."", 'Now this is supposed to be something called ghee, which is basically a clarified butter.', 'But clarified butter will work.', ""All right, so I'm going to put my butter in."", ""I'm going to throw a roughly chopped onion in there with a big pinch of salt."", ""And we're going to brown this."", ""And I'm not talking golden brown."", ""I'm talking almost golden black."", ""That's going to add sweetness and a depth of color to the sauce."", 'So just keep cooking them.', ""And right there you're thinking, that's probably good."", ""It's not."", 'Let them go further.', 'OK?', ""While those are browning, I'm going to get my spice blend together, which is cumin, cinnamon,"", 'black pepper, cayenne, and a lot of it, dry mustard, and paprika.', 'OK?', 'And all that will be on the blog, of course.', ""All right, we're going to go back over the stove, check the onions, and now we're talking."", ""That's what we want."", 'Nicely browned, very dark edges.', 'Perfect.', ""And at that point, we're going to back the heat down to medium and dump in the spices."", ""And we're going to kind of toast the spices in that hot butter."", 'And that really wakes up the flavor, and it really, really adds an extra dimension, which', ""I guess would be the fifth dimension, but who's keeping track?"", 'Not only needs to cook for about two minutes, but it really does make a difference.', ""All right, after that, we're going to go ahead and dump in the marinade that was left"", 'over from the bag of lamb.', 'All right, remember that was the cider and the tamarind and the oil.', ""All right, so I'm going to dump that in."", ""And then we're going to dump in the mixture from the blender, the onion, the tomato, the"", 'ginger, the garlic.', ""We're going to give that a stir."", ""We're going to raise the heat up to high."", 'We want to bring this up to a simmer.', ""And before we put the lamb back in, we're going to go ahead and add a little bit of brown"", 'sugar.', 'Just to balance out that acidity and heat, all right, so stir that in.', 'And then we can place our lamb back in.', ""And if you're using a similar sized pot, you should have enough liquid to just almost"", 'come up to the top.', ""It doesn't have to be totally covered."", ""This is going to stew for three hours, and we're going to turn these several times."", ""So as long as you have that much liquid, you're okay."", ""If you need that, another splash of water, don't be afraid."", ""Don't forget you can always reduce sauces later."", 'So once the lamb goes in, I want you to turn the heat down to low.', 'I want you to cover it tightly, and I want you to simmer that very slowly on very low', 'heat for about three hours, all right, not a bad idea to turn it over once in a while,', ""and all you're trying to do, and why there's no way to screw up the cooking part of this."", ""You're just going to simmer it until the meat's tender."", 'See how that fork goes right into that meat?', ""That's done, all right, so like I said, it's going to take about three hours, but don't"", 'quote me on it.', 'Could take two and a half, could take four, plan accordingly.', ""All right, at that point, I'm going to go ahead and remove the lamb from the pot."", 'You can just cover it with foil while we finish the sauce, and finishing the sauce means', 'two things, the old skim and season.', ""So we're going to turn the heat up a little bit."", 'We want to bring this back to a simmer, and we need to skim off all that fat.', ""There's a ton of it."", 'Just take your ladle and skim all the fat off before you serve it, all right, and besides', 'deep fatifying the top, the other thing you should do is taste for seasoning.', ""Although I highly doubt you're going to have to do much adjusting."", 'But you know what, check just in case, maybe add a little salt.', ""And that's it, go ahead and throw your lamb shank on a plate."", ""I'm serving mine next to some lentils and rice."", ""I'm going to spoon over that incredible sauce."", ""I'm going to garnish with some whole cilantro leaves."", 'And there you go, authenticity, notwithstanding, this was a super delicious, incredibly tasty,', 'very spicy, exciting dinner.', 'Really forked tender, should just fall right off the bone, and just a very complex flavor.', 'Spicy, sweet, sour, aromatic, and that beautiful, subtly gamey lamb, just the absolute perfect', 'meat for this.', 'So I really, really hope you give this a try.', 'Head over to FoodWishes.com for all the ingredient amounts and more info, as usual.', 'And as always, enjoy.']","[0, 6000000, 7360000, 9280000, 10280000, 11400000, 12400000, 14400000, 17600000, 19120000, 21440000, 22440000, 25920000, 28840000, 30520000, 36760000, 38320000, 40240000, 43080000, 45879999, 50640000, 53519999, 54519999, 55519999, 57800000, 60320000, 62519999, 66519999, 70399999, 72880000, 75039999, 78000000, 81600000, 83120000, 84360000, 86000000, 88720000, 94200000, 95600000, 98400000, 102960000, 107800000, 111840000, 115440000, 117440000, 118880000, 122360000, 126640000, 128960000, 133120000, 134400000, 136079999, 140000000, 141440000, 143440000, 146160000, 149400000, 150680000, 152560000, 153640000, 154640000, 155640000, 162680000, 168960000, 169960000, 171960000, 176080000, 177600000, 179920000, 180920000, 184800000, 188080000, 193240000, 196760000, 200440000, 204079999, 205600000, 209239999, 211679999, 214840000, 215840000, 217920000, 219480000, 221360000, 224399999, 225480000, 229280000, 232320000, 237000000, 238400000, 240000000, 243720000, 246480000, 249480000, 252200000, 255519999, 261320000, 266760000, 271760000, 274560000, 276440000, 280159999, 281160000, 284440000, 288520000, 292160000, 294720000, 296000000, 300120000, 301240000, 305960000, 310080000, 312359999, 315840000, 318560000, 321159999, 324159999, 326960000, 333560000, 336480000, 341760000, 349160000, 350160000, 352360000, 356840000]","[6000000, 7360000, 9280000, 10280000, 11400000, 12400000, 14400000, 17600000, 19120000, 21440000, 22440000, 25920000, 28840000, 30520000, 36760000, 38320000, 40240000, 43080000, 45879999, 50640000, 53519999, 54519999, 55519999, 57800000, 60320000, 62519999, 66519999, 70399999, 72880000, 75039999, 78000000, 81600000, 83120000, 84360000, 86000000, 88720000, 94200000, 95600000, 98400000, 102960000, 107800000, 111840000, 115440000, 117440000, 118880000, 122360000, 126640000, 128960000, 133120000, 134400000, 136079999, 140000000, 141440000, 143440000, 146160000, 149400000, 150680000, 152560000, 153640000, 154640000, 155640000, 162680000, 168960000, 169960000, 171960000, 176080000, 177600000, 179920000, 180920000, 184800000, 188080000, 193240000, 196760000, 200440000, 204079999, 205600000, 209239999, 211679999, 214840000, 215840000, 217920000, 219480000, 221360000, 224399999, 225399999, 229280000, 232320000, 237000000, 238400000, 240000000, 243720000, 246480000, 249480000, 252200000, 255519999, 261320000, 266760000, 271760000, 274560000, 276440000, 280159999, 281159999, 284440000, 288520000, 292160000, 294720000, 296000000, 300120000, 301240000, 305960000, 310080000, 312359999, 315840000, 318560000, 321159999, 324159999, 326960000, 333560000, 336479999, 341760000, 349160000, 350160000, 352360000, 356840000, 358720000]"
CLIP ViT-L/14 @ 224px at 1FPS features are created using FrozenBiLM repo and provided as separate files when running scenic.projects.vid2seq.generate_from_file.
Below are the resulting predictions from the above sample inputs.
{"key": "fn9anlEL4FI", "pred": ["Add some garlic ginger paste some chopped onions and some salt to it.", "Turn it constantly and cook it for 3-4 minutes with a lid covered.", "Add some water to it and bring it to a boil and simmer.", "Add some water to it and simmer and let it cook for 15 minutes.", "Add some chopped potatoes and cook it for 15 minutes.", "Add a little bit more water and let it cook for 78 minutes on a low heat.", "Jump in and take a look at it.", "Put the lid on and cook for 78 minutes on a low heat.", "Add some garam masala to it and stir it over.", "Add some more salt if you like.", "Turn it constantly and cook it for about 5-6 minutes.", "Now, let's take a look at it again.", "Add some whole seeds to it.", "Turn it constantly and cook it for 5-6 minutes. In fact, let's take a look at it again. In fact, let's take a look at"], "gts": ["add garram masala seeds and a bay leaf to the oil", "add the lamb to the pot", "add garlic ginger paste and chopped onions to the pot", "add chili tumeric coriander cumin and salt", "add water to the pot", "add potatos to the pot", "add the tomatos to the pot", "add chili to the pot"], "pred_timestamps": [[138670, 148575], [148575, 178290], [183243, 203053], [208006, 262483], [287246, 341724], [341724, 361534], [371439, 386296], [391249, 401154], [401154, 411059], [411059, 416012], [420964, 425917], [430869, 445727], [445727, 450679], [450679, 460584]], "gts_timestamps": [[30000, 39000], [69000, 86000], [136000, 149000], [170000, 183000], [230000, 238000], [309000, 333000], [383000, 390000], [438000, 443000]]}
{"key": "-dh_uGahzYo", "pred": ["This is roasted in desi ghee.", "Take all the pieces of the meat and cook them in the ghee.", "When the ghee heats up.", "Add badi illachi.", "Add black cardamom powder fennel powder and saffron to the meat.", "When the ghee heats up.", "Add the paste of chili powder fennel powder and saffron to the meat.", "When the ghee heats up add badi illachi to the meat and cook it on a slow flame.", "Hello.", "Hello.", "Hello.", "Hello.", "Hello.", "Hello.", "Hello.", "Hello.", "Hello.", "Hello.", "Hello.", "Hello.", "Hello.", "Hello.", "Hello."], "gts": ["mix hanger chili powder ginger powder fennel powder and water", "add cumin seeds green cardamom cinnamon sticks to a blender", "heat some ghee in a pan", "add the black cardamom to the pan", "add the mutton to the pan", "add the mixture", "season with salt and cover the pot", "add the blended spice to the pot", "cover the pot"], "pred_timestamps": [[158805, 164476], [164476, 170148], [175820, 187163], [187163, 192834], [192834, 215521], [221193, 232536], [232536, 249551], [249551, 272237], [272237, 277909], [277909, 289252], [289252, 294924], [294924, 300595], [300595, 306267], [306267, 311938], [311938, 317610], [317610, 323282], [323282, 328953], [328953, 334625], [334625, 340296], [340296, 345968], [345968, 351640], [351640, 357311], [357311, 362983]], "gts_timestamps": [[105000, 120000], [125000, 132000], [138000, 145000], [146000, 148000], [183000, 196000], [224000, 230000], [247000, 259000], [334000, 345000], [381000, 383000]]}
{"key": "BktdaTg6_E4", "pred": ["Add some garlic ginger cherry tomatoes onion and water to a blender.", "Pulse it on and off until a smooth puree is formed.", "Put the lamb shanks in the oven.", "Put some clarified butter in a bowl and add roughly chopped garlic ginger cherry tomatoes onion and a little bit of water.", "Pulse it on and off until a smooth puree is formed.", "Now, let's get into the spice blend.", "Put four lamb shanks in a plastic bag and put some cider vinegar vegetable oil salt and tamarind concentrate in a bowl.", "Add some cumin cinnamon black pepper cayenne dry mustard and paprika. So let's get into the spice blend.", "First of all, let's add cumin cinnamon black pepper cayenne dry mustard and paprika.", "So let's get into the spice blend.", "First of all, let's add cumin cinnamon black pepper cayenne dry mustard and paprika. So let's get into the spice blend. So let's add cumin cinnamon black pepper cayenne"], "gts": ["mix vegetable oil salt and curry masala", "marinate the lamb in a ziplock bag", "season the lamb meat with salt", "bake the lamb meat in an oven", "blend garlic ginger cherry and onion and water", "heat some clarified butter in a pan", "add chopped onion and salt and saute", "mix some cumin cinnamon black pepper and paprika", "add the mixed spices the mixture and the lamb in"], "pred_timestamps": [[105183, 108940], [108940, 116453], [123966, 127723], [127723, 135236], [138992, 142749], [142749, 154019], [157775, 180315], [180315, 184071], [195341, 199097], [199097, 206611], [206611, 214124]], "gts_timestamps": [[30000, 57000], [62000, 75000], [88000, 90000], [91000, 98000], [99000, 118000], [123000, 133000], [134000, 155000], [156000, 172000], [183000, 252000]]}
In the second prediction, "Hello." is repeated over and over. Maybe this weird behavior is what degraded the performance, but I'm not sure how to resolve it.
Here are predicted token ids from the same samples before decoding.
[32128 32133 101 31 60 352 12 2328 16 128 829 7299 42 5260 265 9358 521 5 32134 32140 101 31 60 352 12 22445 8 17871 5 32141 32147 13522 34 147 5 32150 32154 2334 128 9119 15698 11388 5 32155 32158 2334 18510 13211 5 32158 32161 2334 128 19245 4926 31073 11 3136 5 32163 32169 21599 34 147 11 3989 21 81 220 4278 676 5 32173 2334 128 387 5 32176 32180 10267 128 2107 387 11 19633 28 3 9 12533 30 5 32183 32186 2321 3 9 320 44 34 5 32188 32191 2334 128 18510 11076 5 32192 32195 13522 135 16 5 32199 32201 5306 8 12533 223 30 11 3989 21 3 3940 676 5 32202 32203 3521 46 1580 91 5 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[32128 32138 101 31 60 352 12 617 3 354 2960 1788 11486 15 15698 4926 804 6677 4926 11 387 12 8 3 51 12499 11 3989 8 3604 28 34 5 32139 32143 2334 3 107 53 4926 3 5543 15 152 19245 4926 15698 4926 11 804 6677 4926 12 3 9 18942 11 2153 34 139 3 9 387 63 11388 5 32144 32150 2334 3 107 53 4926 1216 77 7299 1442 895 265 32 51 18684 4372 11 3 354 2960 1788 11486 15 12 8 18942 11 2153 34 139 3 9 27978 4926 5 32151 32158 2334 1001 895 265 32 51 12 8 3 122 88 15 11 3989 8 3604 5665 16 8 3 122 88 15 5 32167 2334 8 11388 13 19245 4926 3 89 5990 40 4926 3 7 4127 52 106 4926 11 3 7 4127 52 106 12 8 3 122 88 15 5 32169 2334 3 7 4127 52 106 12 8 3 122 88 15 5 32170 2334 3 7 4127 52 106 12 8 3 122 88 15 5 32171 2334 3 7 4127 52 106 12 8 3 122 88 15 5 32172 2334 3 7 4127 52 106 12 8 3 122 88 15 5 32173 2334 3 7 4127 52 106 12 8 3 122 88 15 5 32173 2334 3 7 4127 52 106 12 8 3 122 88 15 5 32173 2334 3 7 4127 52 106 12 8 3 122 88 15]
[32128 32134 5306 8 17871 6660 5979 7 16 3 9 2343 2182 5 32134 32143 1474 23119 15292 12065 1043 3136 11 3 22713 13119 11345 147 8 17871 11 7042 34 168 5 32144 32148 17039 8 17871 91 13 8 2182 11 4216 34 16 8 4836 5 32153 32159 2334 9119 15698 15665 11395 12909 11 387 12 3 9 18942 11 4764 5 32161 32169 5306 8 4194 12909 11 3136 16 11 3989 552 34 5050 7069 4216 5 32170 32172 2334 1216 77 18684 1001 5270 212 63 5990 2192 23756 11 3 16281 9629 9 5 32173 2334 3 16281 9629 9 12 8 3837 5 32173 2334 3 16281 9629 9 12 8 3837 5 32173 2334 3 16281 9629 9 12 8 3837 5 32173 2334 3 16281 9629 9 12 8 3837 5 32173 2334 3 16281 9629 9 12 8 3837 5 32173 32173 2334 3 16281 9629 9 12 8 3837 5 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
In the config, I set num_training_epochs=0 to run evaluation-only mode. I changed the name of the fine-tuned checkpoint from 'youcook-2' to 'checkpoint_200000' and let the config pick up the checkpoint for evaluation. This part was a little hacky, but this way, the checkpoint was properly loaded.
I used tokenizer downloaded from gs://t5-data/vocabs/cc_all.32000.100extra/sentencepiece.model. One caveat was that the model sometimes outputs token ids ranging from 32000 ~ 32127 which the tokenizer cannot properly handle. I manually excluded tokens in that range when decoding.
Other than the above changes along with eval_batch_size and data path, I left the provided config untouched.
Could you please suggest any idea where I did something wrong? What should I do to reproduce the results properly?
Whisper is indeed a good ASR model. Did you apply a sentence segmentation tool as well on the ASR? Something I am not sure about is how much the trained checkpoints (which have been trained using Google ASR) are robust to the change in ASR data, but I do not expect using different ASR data to result in such big discrepancies. The issue in the repetition can be reduced by increasing the length penalty parameter, but I also don't think tuning this would lead to big discrepancies.
@antoyang Could you elaborate on the sentence segmentation tool? I used the ASR result from Whisper as it is, like in the same input below. It has start and end timestamps with associated sentences.
"video_id","duration","caption","start","end","asr_string","asr_start","asr_end"
"fn9anlEL4FI","490300000","['add garram masala seeds and a bay leaf to the oil', 'add the lamb to the pot', 'add garlic ginger paste and chopped onions to the pot', 'add chili tumeric coriander cumin and salt', 'add water to the pot', 'add potatos to the pot', 'add the tomatos to the pot', 'add chili to the pot']","[30000000, 69000000, 136000000, 170000000, 230000000, 309000000, 383000000, 438000000]","[39000000, 86000000, 149000000, 183000000, 238000000, 333000000, 390000000, 443000000]","[""Welcome back once again to how to cook great food.com. If you haven't already, click that"", 'button and subscribe to our channel. Only make it today, you can be making a lamb and potato', ""curry or masala. As you can see I've got my pan here and in there I've got some oil"", ""that's heating up nicely. I'm using a sunflower oil, go ahead and use any oil you like."", ""We're going to drop in some whole seeds or garam masala. So here they go. We want them"", ""to roast on pop and crackle. There's a bay leaf here. I've got in there some fennel seeds,"", ""cumin seeds, green cardamom, and black mustard seeds. That's what I'm using today for this."", ""They're going to release a wonderful flavour into that oil. Now we're going to go in with our lamb."", ""We're going to fry this for about five or six minutes just with the whole garam masala."", 'Here we go. This lamb has got burning. You can use chilli if you want.', ""So let's just cook this. Let's say it's got about five or six minutes."", ""Stir it over. I'm going to kind of above medium heat. We'll just see it a little bit."", ""Then we're going to add lots of other lovely spices."", 'You can see that our meat is browning really nicely. I mean it is no any accrued.', ""That's what we've got to do now is to get this meat nice and tender."", ""What I do here is I'm going to add some garlic ginger paste. That's a 50-50 mix of garlic and ginger."", ""It's about three of these little teaspoons in there. I'm going to add some chopped onions."", ""I'm using a red onion but go ahead and use white."", ""Then we're going to add some powders. As always if you've watched the channel I call these the big four there equal parts of chilli, coriander, cumin and turmeric."", ""If you'd like of course you can use your favourite curry powder. We're going to add some salt at this stage."", ""Let's flip this over."", ""We're going to cook this for about now three or four minutes. Turn it constantly."", 'Again on a kind of above medium heat.', ""We've got some lovely flavours happening now."", ""Now we're going to add some water."", 'That was cold water by the way.', ""We're just covering it a little bit."", ""We're going to bring this water to the boil and then we're going to simmer this with a lid on."", 'For about 15 minutes this is the part that I hope generally works.', ""We'll tend to as I meet them make it nice and soft."", ""So let's take a look now."", 'Look at that steam out of there.', 'This is cooking down beautifully. As you can see look at that.', 'The needs come straight up of that bone.', ""It's certainly on its way now."", 'A pretty essential part of doing this dish is to get your meat nice and tender.', ""You're getting that with an awful tough meat."", 'Now made of what cut you use.', ""Maybe you put really expensive lamb but it would still end up being tough if you don't go for this process."", ""I'm now going to add some potatoes."", ""We've tough peeled and chopped."", 'These are fairly small.', 'You cut them however you like and the cooking process will obviously take a longer time if you put them in as much bigger.', ""So again let's give this a mix."", 'Stir them in.', ""We've still got a decent amount of moisture in there from that water."", ""If you haven't at this point maybe you've got to really dry."", 'Add a bit more water now.', ""It's going to go back on."", ""I'm going to cook this for about 78 minutes on a fairly low heat."", 'Not a simmer, above a simmer.', ""Okay let's jump in now and take a look."", ""Let's look in more like it."", 'The potatoes are cooking very nicely.', 'I kind of like my potatoes quite soft.', ""I'm now at this stage going to add some chopped tomatoes."", ""I'm just going to spread them on the top."", ""I'll put the lid back on."", ""On a fairly low heat we're going to cook them just for about five minutes."", 'What they should do is break down with the steam.', ""Don't stir them at the moment."", 'The steam will break them down.', ""We're going to mix it around once and come back."", 'We may add a little tad more water perhaps.', ""And then we're pretty much done."", 'We should be now at the final stage.', 'Yeah these are soft and really nice the as you can see.', ""And they've given off a little bit of moisture as well."", 'Just now turning it over.', ""At this stage I'm going to add some fresh chilli."", ""It's totally optional as to how much you're putting."", ""I'm putting about four or five there."", 'You now need to check this for salt.', ""It's all good for me."", 'You can if you want finish that off with some fresh coriander or cilantro.', ""Let's just cook that for about two more minutes and it's done."", ""It's wonderful."", ""I'm really happy with it."", ""I'll see you again soon."", 'Take care.', 'Thank you.']","[0, 9800000, 14840000, 22140000, 26800000, 33800000, 41800000, 57300000, 61800000, 68800000, 86800000, 101800000, 109800000, 120800000, 127800000, 132800000, 139800000, 146800000, 161800000, 173800000, 185800000, 196800000, 205800000, 219800000, 226800000, 235800000, 241800000, 248800000, 254800000, 259800000, 266800000, 269800000, 273800000, 278800000, 282800000, 287800000, 292800000, 296800000, 298800000, 305800000, 309800000, 314800000, 318800000, 333800000, 337800000, 346800000, 350800000, 353800000, 358800000, 361800000, 366800000, 368800000, 372800000, 375800000, 378800000, 382800000, 388800000, 391800000, 393800000, 396800000, 399800000, 401800000, 403800000, 405800000, 408800000, 411800000, 415800000, 418800000, 423800000, 435800000, 438800000, 441800000, 446800000, 453800000, 456800000, 461800000, 464800000, 465800000, 467800000, 468800000, 469800000]","[9800000, 14840000, 22140000, 26800000, 33800000, 41800000, 48800000, 61800000, 68800000, 76800000, 101800000, 108800000, 115800000, 127800000, 131800000, 139800000, 146800000, 152800000, 173800000, 185800000, 192800000, 204800000, 211800000, 223800000, 231800000, 241800000, 246800000, 254800000, 259800000, 266800000, 269800000, 273800000, 278800000, 282800000, 286800000, 292800000, 296800000, 298800000, 305800000, 309800000, 313800000, 318800000, 333800000, 337800000, 340800000, 350800000, 353800000, 358800000, 361800000, 366800000, 368800000, 372800000, 375800000, 378800000, 382800000, 387800000, 391800000, 393800000, 396800000, 399800000, 401800000, 403800000, 405800000, 408800000, 411800000, 414800000, 418800000, 423800000, 426800000, 438800000, 441800000, 446800000, 450800000, 456800000, 461800000, 464800000, 465800000, 467800000, 468800000, 469800000, 471800000]"
"-dh_uGahzYo","561490000","['mix hanger chili powder ginger powder fennel powder and water', 'add cumin seeds green cardamom cinnamon sticks to a blender', 'heat some ghee in a pan', 'add the black cardamom to the pan', 'add the mutton to the pan', 'add the mixture', 'season with salt and cover the pot', 'add the blended spice to the pot', 'cover the pot']","[105000000, 125000000, 138000000, 146000000, 183000000, 224000000, 247000000, 334000000, 381000000]","[120000000, 132000000, 145000000, 148000000, 196000000, 230000000, 259000000, 345000000, 383000000]","['Hello, Namaste, Salamwalekum sastriya kal.', 'Welcome back to another session with your watch of at warawa.com.', 'Today I am going to show you another favorite of mine.', 'I am very surprised while I was checking the list of the dishes I did.', 'I did not make Mutton Rogen Josh.', 'Dear friends, this is one of the tastiest and super awesome dish from Kashmir.', 'You know, this is the dish what I learned from the master chefs only in five style hotels.', 'But I have seen in lot of restaurant they serve Mutton Rogen Josh.', 'They just serve the Mutton Curry and call it Rogen Josh and he does not have the punch', 'what Rogen Josh must have.', 'You know, a lot of people add onion, tomato and all this in making me Rogen Josh.', 'But what I am going to do today, I am not going to add onion or tomatoes nor even yogurt.', 'You know, if you want you can add a little bit of yogurt but I am not even going to add yogurt.', 'So for this the spice is what we are going to add is Javitri that is Mace, Cinnamon, Green Cardamom,', 'Cumin seeds, Black Cardamom and Saferan.', 'You know, I am going to make a powder of these four and add while as I black illaji or the black cardamom', 'and cook the meat with it.', 'Now here I have got one end of table spoon of chilli powder.', 'Not any chilli powder, Kashmiri chilli powder.', 'That is what will give nice red colour, ginger powder but one table spoon of final seed powder.', 'And we are going to add this in this quantity and that will give a very nice tasteful gravy.', 'Now to make it very simple I am going to mix all of these masalas together so you will understand.', 'So here I have got hing powder.', 'You know, hing is a must for Mutton Rogen Josh and in this add Kashmiri chilli powder.', 'Ginger powder and final seed powder.', 'And in this add water and mix this into a watery paste.', 'And now we are going to add in a blender.', 'I am going to add the cumin seeds, green cardamom, cinnamon sticks and Javitri that is Mace.', 'I am going to powder and add it.', 'So make it in a nice coarse powder.', 'You know, you are not going to cook it in the oil.', 'This Mutton Rogen Josh needs to be cooked in nice desi ghee.', 'When this desi ghee heats up we are going to add badi illachi that is black cardamom.', 'That will give a nice flavour to this dish.', 'And here I have got meat.', 'This is a nice lamb meat and all these meats have bone.', 'Nally that is the shanks of meat.', 'And take all the pieces which are like shanks.', 'And when these get cooked like this with the bone in, the gravy becomes nice and very', 'flavourful.', 'Now here the ghee is heated up and my black cardamom is nicely roasted in this add pieces', 'of this meat.', 'And we are going to cook this meat in this ghee.', 'And you have to cook in the meat becomes slightly brown.', 'That is when you get a very good flavour to the gravy.', 'You know, it is better always to fry the meat like this and then cook it on a slow', 'flame.', 'You know, now look at this meat.', 'This is nicely slightly brown and you know, this method of cooking is used not only in', 'India but throughout the world.', 'When you roast the meat like this, it is called Milad effect.', 'What it does is it caramelizes the outer coating of meat and gives a very nice flavour to', 'this dish.', 'Now this is all ready.', 'Now in this, you are going to add the mixture, the paste of the chili powder and soft', 'powder, fennel powder into this.', 'And you can also add little of saffron.', 'You know, this will also give a very nice flavour to this dish and pour in a lot of water', 'to cover the meat.', 'You know, because I wanted to show you, I am cooking in such a big pan.', 'Otherwise I would have taken a little smaller pan like this but you know, to make sure that', 'you see what is happening in the pan I took a vessel like this.', 'Now put the lid on and cook it on a slow flame for at least one hour to one and a half', 'hour.', 'Another easy method of, if you do not have patience to spend one and a half hour of slow cooking,', 'easy method is just pour this into a pressure cooker, cook it and again transfer it back', 'in this pan because you want this masala also to be cooked.', 'In a slow method of cooking like this, what it does is it evaporates the water because', 'we added little extra water in this, that water will be overrated and when the sauce is done,', 'it has to be liquidy but all the masala needs to be cooked.', 'And here is the masala of cumin, cardamom, cinnamon sticks.', 'And when you are cooking in an open method like this, when this is cooked for like half', 'of the time that is almost 45 minutes, then we are going to add this.', 'But if you do in the pressure cooker, you will have to add it after the meat is cooked.', 'After cooking for almost 45 minutes, now look at this gravy, this is nice, the oil is', 'also slightly floating on top and you can see this meat.', 'The lamb bones were not visible when we started but look at this.', 'Now after 45 minutes, they coming off the bone, that is when you know that the meat is', 'getting nicely cooked.', 'Now here is the masala powder of a maze that is Javitri, cinnamon, cardamom, cumin and', 'all this and then we are going to add to this.', 'This is what will give a nice flavor to your Rogen Josh.', 'Just add all of this, mix it and we are going to cook this for another 30 minutes at least.', 'Till the time the meat is become nice and tender, the meat should be so much cooked that', 'it should be coming off the bone and also when it is properly done, this meat will literally', 'melt in your mouth, that is when you got a perfect and a super awesome tasty Rogen Josh.', ""So dear friends, you don't need to add curd, no tomatoes, no onions."", 'Just with this masala, you will not believe how much awesome flavor this is already giving.', 'So let me put the lid on and if you need to add little water, you can keep adding little', 'water till you get the desired consistency.', 'After cooking it for almost another 30 minutes, the flavor of Rogen Josh has spread all', 'over and you can see how the Rogen means, this oil that is floating, red in color, look', 'at this.', 'That is what makes this awesome dish super to look at and tasty also and wow, you know', 'if you make it right, this will taste super fantastic.', 'It is so good.', 'Trust me, make it the way I have shown you and it will be super fantastic.', 'Dear friends, this is something magical, this is something super awesome.', 'But you use nice lamp shanks to make it and take it easy on the ginger powder and you', 'will get nice perfect Rogen Josh.', 'You know while we were in the college, they always used to add Ratanjog.', 'But when I was in the industry, they told no, no, no Ratanjog in this, the color should', 'come from the Kashmiri chillies, the flavor, the aroma should come from saffron and some', 'people also add coxcom is a kind of a flower which gives also a nice coloring.', 'Some people add that but for me, this is super perfect.', 'I am going to switch off the flame and I am going to enjoy it hot along with my non-vav.', 'Now look at this Rogen Josh.', 'Wow, what flavors along with this?', 'So much perfectly cooked and you know, especially when I cook meat like this, I want the meat', 'to be fully in my s and tanda but still retains some of the pink color.', 'As a reason why I did not add turmeric but some people add if you want, you can add', 'turmeric also but dear friends, wow.', 'You know, eat with basmati rice or nice mughalai, non like the one what I am eating.', 'This is super and the nice sauce is also nice sticky and super tasty.', ""Dear friends, I hope you enjoyed today's session of learning how to make this awesome"", 'Mutton Rogen Josh from Kashmir but do not forget, Vahrehvah is all about inspiring', 'others to cook.', 'So please post your recipes and cooking tips at vahrehvah.com.', 'So others can benefit from your great cooking.', 'Thank you.']","[0, 10840000, 14560000, 17440000, 20840000, 24000000, 29720000, 34519999, 38360000, 43120000, 45200000, 50200000, 56560000, 60400000, 67640000, 71040000, 78120000, 79720000, 84360000, 87360000, 93880000, 98800000, 104600000, 107000000, 114000000, 119160000, 124600000, 127560000, 134240000, 136200000, 139720000, 141920000, 145880000, 150760000, 153600000, 156800000, 161760000, 164640000, 167200000, 172920000, 173920000, 183200000, 184200000, 188320000, 192840000, 196760000, 201300000, 202300000, 204720000, 210840000, 212640000, 216520000, 221440000, 222440000, 224240000, 231680000, 234520000, 238280000, 246280000, 247280000, 252000000, 256839999, 259880000, 265399999, 266399999, 271240000, 276480000, 281080000, 285080000, 290200000, 293520000, 298280000, 302760000, 306320000, 312320000, 318640000, 322880000, 326640000, 331039999, 332560000, 340320000, 343520000, 347560000, 354120000, 359479999, 364799999, 371760000, 376640000, 383000000, 387600000, 390760000, 396800000, 405120000, 406280000, 416480000, 425760000, 426760000, 434760000, 444120000, 450560000, 454000000, 458600000, 464400000, 469760000, 475960000, 480320000, 487320000, 492760000, 503240000, 509240000, 514840000, 519480000, 522880000, 531160000, 538320000, 542840000, 547520000, 548520000, 551760000, 554000000]","[10840000, 14560000, 17440000, 20840000, 24000000, 29720000, 34519999, 38360000, 43120000, 45200000, 50200000, 56560000, 60400000, 67640000, 71040000, 78120000, 79720000, 84240000, 87360000, 93880000, 98800000, 104600000, 107000000, 114000000, 119160000, 124600000, 127560000, 134240000, 136200000, 139000000, 141920000, 145880000, 150760000, 153600000, 156800000, 161760000, 164640000, 167200000, 172920000, 173920000, 183200000, 184200000, 188320000, 192839999, 196760000, 201300000, 202300000, 204720000, 210840000, 212640000, 216520000, 221440000, 222440000, 224240000, 231680000, 234520000, 238280000, 246280000, 247280000, 252000000, 256839999, 259880000, 265399999, 266399999, 271240000, 276480000, 281080000, 285080000, 290200000, 293520000, 298280000, 302760000, 306320000, 312320000, 318640000, 322880000, 326640000, 331039999, 332560000, 340320000, 343520000, 347560000, 354120000, 359400000, 364799999, 371760000, 376640000, 383000000, 387599999, 390760000, 396800000, 405120000, 406280000, 416480000, 425760000, 426760000, 434760000, 444120000, 450560000, 454000000, 458600000, 464400000, 469680000, 475960000, 480320000, 487320000, 492760000, 503240000, 509240000, 514840000, 519480000, 522880000, 531160000, 538320000, 542840000, 547520000, 548520000, 551760000, 554000000, 554360000]"
"BktdaTg6_E4","371900000","['mix vegetable oil salt and curry masala', 'marinate the lamb in a ziplock bag', 'season the lamb meat with salt', 'bake the lamb meat in an oven', 'blend garlic ginger cherry and onion and water', 'heat some clarified butter in a pan', 'add chopped onion and salt and saute', 'mix some cumin cinnamon black pepper and paprika', 'add the mixed spices the mixture and the lamb in']","[30000000, 62000000, 88000000, 91000000, 99000000, 123000000, 134000000, 156000000, 183000000]","[57000000, 75000000, 90000000, 98000000, 118000000, 133000000, 155000000, 172000000, 252000000]","['Hello, this is Chef John from Foodwishes.com with Lamb Shank Vindaloo.', ""That's right, I get a lot of complaints."", 'How come you never do Indian food?', ""It's because I'm scared."", ""I don't have a lot of experience with it."", 'I love to eat it.', 'But I thought I would give this one of my favorites to try.', 'This very spicy lamb type curry dish.', 'So I hope I got it close.', 'You Indian cuisine experts will be the judge.', 'So here we go.', ""So step one here, I'm going to put four lamb shanks in a plastic bag."", 'You need to get marinated overnight before we start the dish.', ""So I'm going to place those in."", ""And then into a bowl, I'm going to pour some cider vinegar, some vegetable oil, some salt,"", 'and then something called tamarind.', ""I'm using a tamarind concentrate."", ""And we'll talk a little bit about that on the blog."", ""But it's a very tart, sour kind of citrus-like ingredient."", 'All right, I started mixing that up and then I realized I never put the garmasala in,', 'which is a blend of Indian spices.', ""We've used that before."", 'We like it.', ""All right, so I'm going to mix that in and that's basically the marinade."", ""So we're going to pour that over the lamb shanks."", ""We're going to seal up that bag really well."", 'All right, just to confuse you, I put mine in a second bag as I thought I had a leak.', ""We're going to squeeze out as much air as possible so the meat is immersed in the marinade."", ""And then we're going to put that in the fridge overnight."", 'Not a bad idea to turn it over once in a while.', ""All right, the next day I'm going to pull it out of the bag."", ""I'm going to place it on an oiled foil lined sheet pan."", ""Don't throw away the marinade, by the way."", ""That's going in the stew later."", 'So just reserve the marinade.', ""I'm going to salt those generously on both sides."", ""And we're going to brown those in a very hot oven for 50, for 15 or 20 minutes until"", ""they're nice and brown."", ""We're going to pull those out and reserve them till needed."", ""Next up in a blender, we're going to add a lot of garlic, a lot of ginger, some cherry"", 'tomatoes, a nice big onion, and a little bit of water.', ""We're going to pulse that on and off until we have a nice smooth puree."", 'And it kind of looks like a delicious strawberry smoothie.', ""And yet it's so the opposite of that."", 'So just set that aside.', ""And it's back over to this stove where we're going to start the actual vindaloo."", ""So we're going to put a heavy Dutch oven on medium high heat."", ""And I'm going to put in some clarified butter."", 'Now this is supposed to be something called ghee, which is basically a clarified butter.', 'But clarified butter will work.', ""All right, so I'm going to put my butter in."", ""I'm going to throw a roughly chopped onion in there with a big pinch of salt."", ""And we're going to brown this."", ""And I'm not talking golden brown."", ""I'm talking almost golden black."", ""That's going to add sweetness and a depth of color to the sauce."", 'So just keep cooking them.', ""And right there you're thinking, that's probably good."", ""It's not."", 'Let them go further.', 'OK?', ""While those are browning, I'm going to get my spice blend together, which is cumin, cinnamon,"", 'black pepper, cayenne, and a lot of it, dry mustard, and paprika.', 'OK?', 'And all that will be on the blog, of course.', ""All right, we're going to go back over the stove, check the onions, and now we're talking."", ""That's what we want."", 'Nicely browned, very dark edges.', 'Perfect.', ""And at that point, we're going to back the heat down to medium and dump in the spices."", ""And we're going to kind of toast the spices in that hot butter."", 'And that really wakes up the flavor, and it really, really adds an extra dimension, which', ""I guess would be the fifth dimension, but who's keeping track?"", 'Not only needs to cook for about two minutes, but it really does make a difference.', ""All right, after that, we're going to go ahead and dump in the marinade that was left"", 'over from the bag of lamb.', 'All right, remember that was the cider and the tamarind and the oil.', ""All right, so I'm going to dump that in."", ""And then we're going to dump in the mixture from the blender, the onion, the tomato, the"", 'ginger, the garlic.', ""We're going to give that a stir."", ""We're going to raise the heat up to high."", 'We want to bring this up to a simmer.', ""And before we put the lamb back in, we're going to go ahead and add a little bit of brown"", 'sugar.', 'Just to balance out that acidity and heat, all right, so stir that in.', 'And then we can place our lamb back in.', ""And if you're using a similar sized pot, you should have enough liquid to just almost"", 'come up to the top.', ""It doesn't have to be totally covered."", ""This is going to stew for three hours, and we're going to turn these several times."", ""So as long as you have that much liquid, you're okay."", ""If you need that, another splash of water, don't be afraid."", ""Don't forget you can always reduce sauces later."", 'So once the lamb goes in, I want you to turn the heat down to low.', 'I want you to cover it tightly, and I want you to simmer that very slowly on very low', 'heat for about three hours, all right, not a bad idea to turn it over once in a while,', ""and all you're trying to do, and why there's no way to screw up the cooking part of this."", ""You're just going to simmer it until the meat's tender."", 'See how that fork goes right into that meat?', ""That's done, all right, so like I said, it's going to take about three hours, but don't"", 'quote me on it.', 'Could take two and a half, could take four, plan accordingly.', ""All right, at that point, I'm going to go ahead and remove the lamb from the pot."", 'You can just cover it with foil while we finish the sauce, and finishing the sauce means', 'two things, the old skim and season.', ""So we're going to turn the heat up a little bit."", 'We want to bring this back to a simmer, and we need to skim off all that fat.', ""There's a ton of it."", 'Just take your ladle and skim all the fat off before you serve it, all right, and besides', 'deep fatifying the top, the other thing you should do is taste for seasoning.', ""Although I highly doubt you're going to have to do much adjusting."", 'But you know what, check just in case, maybe add a little salt.', ""And that's it, go ahead and throw your lamb shank on a plate."", ""I'm serving mine next to some lentils and rice."", ""I'm going to spoon over that incredible sauce."", ""I'm going to garnish with some whole cilantro leaves."", 'And there you go, authenticity, notwithstanding, this was a super delicious, incredibly tasty,', 'very spicy, exciting dinner.', 'Really forked tender, should just fall right off the bone, and just a very complex flavor.', 'Spicy, sweet, sour, aromatic, and that beautiful, subtly gamey lamb, just the absolute perfect', 'meat for this.', 'So I really, really hope you give this a try.', 'Head over to FoodWishes.com for all the ingredient amounts and more info, as usual.', 'And as always, enjoy.']","[0, 6000000, 7360000, 9280000, 10280000, 11400000, 12400000, 14400000, 17600000, 19120000, 21440000, 22440000, 25920000, 28840000, 30520000, 36760000, 38320000, 40240000, 43080000, 45879999, 50640000, 53519999, 54519999, 55519999, 57800000, 60320000, 62519999, 66519999, 70399999, 72880000, 75039999, 78000000, 81600000, 83120000, 84360000, 86000000, 88720000, 94200000, 95600000, 98400000, 102960000, 107800000, 111840000, 115440000, 117440000, 118880000, 122360000, 126640000, 128960000, 133120000, 134400000, 136079999, 140000000, 141440000, 143440000, 146160000, 149400000, 150680000, 152560000, 153640000, 154640000, 155640000, 162680000, 168960000, 169960000, 171960000, 176080000, 177600000, 179920000, 180920000, 184800000, 188080000, 193240000, 196760000, 200440000, 204079999, 205600000, 209239999, 211679999, 214840000, 215840000, 217920000, 219480000, 221360000, 224399999, 225480000, 229280000, 232320000, 237000000, 238400000, 240000000, 243720000, 246480000, 249480000, 252200000, 255519999, 261320000, 266760000, 271760000, 274560000, 276440000, 280159999, 281160000, 284440000, 288520000, 292160000, 294720000, 296000000, 300120000, 301240000, 305960000, 310080000, 312359999, 315840000, 318560000, 321159999, 324159999, 326960000, 333560000, 336480000, 341760000, 349160000, 350160000, 352360000, 356840000]","[6000000, 7360000, 9280000, 10280000, 11400000, 12400000, 14400000, 17600000, 19120000, 21440000, 22440000, 25920000, 28840000, 30520000, 36760000, 38320000, 40240000, 43080000, 45879999, 50640000, 53519999, 54519999, 55519999, 57800000, 60320000, 62519999, 66519999, 70399999, 72880000, 75039999, 78000000, 81600000, 83120000, 84360000, 86000000, 88720000, 94200000, 95600000, 98400000, 102960000, 107800000, 111840000, 115440000, 117440000, 118880000, 122360000, 126640000, 128960000, 133120000, 134400000, 136079999, 140000000, 141440000, 143440000, 146160000, 149400000, 150680000, 152560000, 153640000, 154640000, 155640000, 162680000, 168960000, 169960000, 171960000, 176080000, 177600000, 179920000, 180920000, 184800000, 188080000, 193240000, 196760000, 200440000, 204079999, 205600000, 209239999, 211679999, 214840000, 215840000, 217920000, 219480000, 221360000, 224399999, 225399999, 229280000, 232320000, 237000000, 238400000, 240000000, 243720000, 246480000, 249480000, 252200000, 255519999, 261320000, 266760000, 271760000, 274560000, 276440000, 280159999, 281159999, 284440000, 288520000, 292160000, 294720000, 296000000, 300120000, 301240000, 305960000, 310080000, 312359999, 315840000, 318560000, 321159999, 324159999, 326960000, 333560000, 336479999, 341760000, 349160000, 350160000, 352360000, 356840000, 358720000]"
One thing I noticed is that ground truth captions are all lowercase, while ASR results are not. Also, ground truth captions do not include punctuations, while ASR results do.
Not sure if it's because of these differences, but the prediction sentences have capitalized first letters.
{"key": "fn9anlEL4FI", "pred": ["We're going to drop in some whole seeds or garam masala.", "We're going to fry the lamb.", "Stir it over.", "Add some garlic ginger paste.", "Add chopped onions. Add some coriander cumin and turmeric.", "Add some chili powder chili coriander cumin turmeric and salt and cook.", "Add some water to the pan.", "Bring the water to a boil and simmer.", "Take a look at the meat.", "Add some chopped potatoes.", "Stir them in add some water and let it cook.", "Turn the heat down and simmer for 78 minutes.", "Turn off the heat and let the meat cool down completely and then turn on the heat again."], "gts": ["add garram masala seeds and a bay leaf to the oil", "add the lamb to the pot", "add garlic ginger paste and chopped onions to the pot", "add chili tumeric coriander cumin and salt", "add water to the pot", "add potatos to the pot", "add the tomatos to the pot", "add chili to the pot"], "pred_timestamps": [[0, 24762626], [29715151, 59430303], [69335353, 99050505], [108955555, 128765656], [133718181, 143623232], [168385858, 208006060], [212958585, 222863636], [227816161, 257531313], [277341414, 287246464], [297151515, 312009090], [321914141, 356581818], [361534343, 371439393], [376391919, 416012121]], "gts_timestamps": [[30000000, 39000000], [69000000, 86000000], [136000000, 149000000], [170000000, 183000000], [230000000, 238000000], [309000000, 333000000], [383000000, 390000000], [438000000, 443000000]]}
{"key": "-dh_uGahzYo", "pred": ["We are going to add javitri mace cinnamon green cardamom and black cardamom to the meat and cook the meat with it.", "We are going to add hing powder chili powder ginger powder and final seed powder to a blender.", "Add water and mix it into a watery paste.", "Add cumin seeds green cardamom cinnamon stick and black cardamom to the meat and cook the meat in the ghee. When the ghee heats up add pieces of lamb to it and cook the meat on a slow flame. Now add the paste to the ghee. Add the paste to the meat. Add the meat to the ghee. Add the paste to the meat. Add the meat to the ghee. Add the paste to the meat. Add the meat to the ghee. Add the paste to the meat.", "Add the meat to the ghee and cook.", "Add the paste to the meat.", "Add salt to the meat.", "Add garlic powder salt black pepper and mix it all together.", "Add"], "gts": ["mix hanger chili powder ginger powder fennel powder and water", "add cumin seeds green cardamom cinnamon sticks to a blender", "heat some ghee in a pan", "add the black cardamom to the pan", "add the mutton to the pan", "add the mixture", "season with salt and cover the pot", "add the blended spice to the pot", "cover the pot"], "pred_timestamps": [[0, 28358080], [51044545, 73731010], [79402626, 113432323], [119103939, 153133636], [272237575, 289252424], [294924040, 300595656], [300595656, 306267272], [306267272, 334625353], [340296969, 345968585]], "gts_timestamps": [[105000000, 120000000], [125000000, 132000000], [138000000, 145000000], [146000000, 148000000], [183000000, 196000000], [224000000, 230000000], [247000000, 259000000], [334000000, 345000000], [381000000, 383000000]]}
{"key": "BktdaTg6_E4", "pred": ["Put the lamb shanks in a plastic bag and marinate with cider vinegar vegetable oil salt and tamarind concentrate.", "Pour the marinade over the lamb and seal it well and keep it in the fridge.", "Take the lamb shanks out of the marinade and brown them in the oven.", "Add garlic ginger tomatoes onion and water to a blender and pulse.", "Put some clarified butter a roughly chopped onion a pinch of salt and brown it.", "Keep cooking until they turn golden brown.", "Add cumin cinnamon black pepper cayenne pepper and cayenne pepper to the spice blend. Add that to the sauce. Add the spice blend to the sauce.", "Put the lamb shanks in a bowl and pour cider vinegar vegetable oil salt and tamarind concentrate over it.", "Put the sauce on and serve."], "gts": ["mix vegetable oil salt and curry masala", "marinate the lamb in a ziplock bag", "season the lamb meat with salt", "bake the lamb meat in an oven", "blend garlic ginger cherry and onion and water", "heat some clarified butter in a pan", "add chopped onion and salt and saute", "mix some cumin cinnamon black pepper and paprika", "add the mixed spices the mixture and the lamb in"], "pred_timestamps": [[0, 22539393], [22539393, 56348484], [82644444, 97670707], [101427272, 116453535], [123966666, 154019191], [154019191, 157775757], [169045454, 195341414], [217880808, 236663636], [240420202, 244176767]], "gts_timestamps": [[30000000, 57000000], [62000000, 75000000], [88000000, 90000000], [91000000, 98000000], [99000000, 118000000], [123000000, 133000000], [134000000, 155000000], [156000000, 172000000], [183000000, 252000000]]}
To make reproduction easier, could you consider releasing Google ASR results for ActivityNet Captions and YouCook2 (at least for validation split)? My guess is the trained checkpoint is not much robust to the change in ASR data.
The ASR input for Vid2Seq was formatted into sentences. The sentence segmentation tool I used is the one from the Google API. It is unfortunately not possible to release the Google ASR results. Using Whisper, maybe the sentence segmentation tool from https://github.com/m-bain/whisperX would help? I think I also formatted the ground-truth captions with capitalization and added a point at the end (either during data processing or data loading).
@dreamgonfly Were you able to reproduce the results using some other ASR text and/or improving your current implementation? I am also interested in using finetuned dense captioning model for my application. Thanks!
Hi, @dreamgonfly would you mind sharing the code of how you implement it? I try to implement vid2seq but there are issues unable to solve, for example
AttributeError: module 'flax.config' has no attribute 'update'
some bugs I have no idea which line to change
@antoyang Have you solved the question? I've met the same problem.
hi @dreamgonfly , I wonder if you tried training the model without transcripts and get the similar results in Table 2, Row#1, as this does not need any pre-training or ASR.
@antoyang, I followed the same steps mentioned by @dreamgonfly, however, I am unable to train the model with Visual input only. The training loss becomes NaN
after few iterations and caption metrics are Zero always. Can you suggest something to resolve the issue?
Thanks!
I am unable to reproduce the results of Row#1 in Table 2 i.e. using only visual input without any pre-training.
I am using single A100 (80 GB) GPU to run the code with batch size of 32. With default config, I got NaN loss during training. Hence to avoid it, I modified the following params:
YouCook2 Dataset Details:
Here are the results:
Predictions: pred_txt.txt
if anyone can help or suggest something to reproduce the results, that will be great!
Hi, @thechargedneutron . I still cannot reproduce the results. With ASR results from Whisper I could improve the results a bit, but it was still far below the reported numbers on the paper.
@ee2110 I can provide you with the detailed instructions on how I run the vid2seq code. Please email me if you're interested!
@anilbatra2185 I did not try to train the model yet. I was going to evaluate the released checkpoint first but I got stuck.
thanks @dreamgonfly for replying back!
As I am unable to reproduce the results with only visual input, which make me think that ASR (from whisper) might not be the concern. There might be some missing config parameter or change in code behaviour with latest versions of libraries it used.
@antoyang I tried WhisperX and the results slightly got better, but still far below the reported performance. (e.g., METEOR 4.3 on YouCook2 with Whisper -> 4.5 with WhisperX vs. 9.3 from paper) This result is from the released checkpoint without training any parameters.
Based on these results, I agree with @anilbatra2185 that the main concern might not be with ASR, but with visual features or other parts of the code.
Interesting. I don't have the bandwidth to look at it further now but will release a PyTorch implementation by the end of September.
@antoyang Thanks for the great work. Could you please release the prediction results for our reference?
Interesting. I don't have the bandwidth to look at it further now but will release a PyTorch implementation by the end of September.
It's a great job, and I'm looking forward to seeing your work based on Pytorch. Can you provide a script for inference for input video (arbitrary input video).
@dreamgonfly Can you help us by sharing your implementation instruction and code? I am trying to generate inference for multiple event based videos.
@PKUCSS I do not have access to this given that it was an internship work. @BaoliangChen-stu A PyTorch implementation (with a few differences explained in the readme) is included here: https://github.com/antoyang/VidChapters. It also includes an example of inference script.
I followed the instructions in README to evaluate the released checkpoints, but I could not reproduce the results on the paper.
The paper says a fully fine-tuned Vid2Seq achieves 7.9 SODA_c, 47.1 CIDEr, 9.3 METEOR on YouCook2, and 5.8 SODA_c, 30.1 CIDER, 8.5 METEOR on ActivityNet (Table 5). However, the numbers I got by re-running the code were much lower than the results on the paper (around 20 CIDEr score on YouCook2)
Could you share how I can reproduce the results?
Below are important steps from how I tried to run the evaluation code.
First, I preprocessed data as follows:
Second, I evaluated released checkpoints as follows: