Open irux opened 3 months ago
Hey! Is there anything new you guys are working on? More data? I love this because I think multion is actually doing a decent work on this kind of tasks. I really think this is the future of agents.
We are currently working on improving the quality of the data representation, which could be much more optimized! After that, collecting more data is under our radar. Also, combining datasets is also interesting (for example, mind2web and aitw are interesting datasets to add).
Do you have maybe other papers or more actual information on what is currently happening on this topic?
Right now we have WebLINX (https://arxiv.org/abs/2402.05930) but more papers will take a while! However feel free to keep an eye on the release notes and discussions on the weblinx repo as well as here.
Do you know anything else that is working with a computer vision approach or maybe with a multi modal model?
We have a few experiments with multimodal and image-to-text models. Pix2Act is interesting since it's very small but performs somewhat well on weblinx evals.
Any new research on the DRM models?
I'm not sure what DRM models are. can you expand?
hey @xhluca ! thanks for your reply!
Sorry btw, it was a typo, I was referring to the Dense Markup Ranking (DMR) models, the ones you mention on the paper here: https://arxiv.org/pdf/2402.05930
Please, if you have any kind of discord or telegram group or somehow an option to be more involved, I would love to be part of it. I love the topic and I think this has a huge potential :)
Yes, we are interested in building better DMR variants! We are still looking into different ways we can approach the candidate selection problem.
Regarding discord, I think it's a great idea to create one! I will look into it and discuss with collaborators!
Hey @xhluca ! Any news on this? Are you looking into the multi modal llama 3.2 for this? If I can help somehow, just let me know!
Hey! We are all actively working on improving weblinx. Llama 3.2 is definitely under our radar, but we are waiting to streamline our new eval pipeline and augment the training data before proceeding.
That said, if you are working on llama 3.2 and would like to contribute a PR that adds the vision capability, I'd be happy to review the results & merge!
Hey! Is there anything new you guys are working on? More data? I love this because I think multion is actually doing a decent work on this kind of tasks. I really think this is the future of agents.
Do you have maybe other papers or more actual information on what is currently happening on this topic?
Do you know anything else that is working with a computer vision approach or maybe with a multi modal model?
Any new research on the DRM models?
Sorry for so many questions but I find this fascinating!