awesome-colab-notebooks

The page might not be rendered properly. Please open README.md file directly

Awesome colab notebooks collection for ML experiments

repositories	papers
facebookresearch/co-tracker iterative/datachain callummcdougall/ARENA_3.0 ToTheBeginning/PuLID ZhengPeng7/BiRefNet ultralytics/ultralytics unslothai/unsloth facebookresearch/segment-anything-2 lllyasviel/IC-Light gemelo-ai/vocos comfyanonymous/ComfyUI TransformerLensOrg/TransformerLens HongwenZhang/PyMAF-X roboflow/supervision KwaiVGI/LivePortrait piddnad/DDColor TencentARC/InstantMesh LAION-AI/aesthetic-predictor Doubiiu/DynamiCrafter facebookresearch/home-robot KillianLucas/open-interpreter jxnl/instructor	LIDA Gaussian Splatting Tune-A-Video FollowYourPose Text2Video-Zero GLIP UniFormerV2 SadTalker OWL-ViT VideoReTalking LDM Dream Fields Detic GraphCast DragGAN VRT Thin-Plate Spline Motion Model PyMAF-X FateZero py-irt VQ-Diffusion ECON

facebookresearch/co-tracker
iterative/datachain
callummcdougall/ARENA_3.0
ToTheBeginning/PuLID
ZhengPeng7/BiRefNet
ultralytics/ultralytics
unslothai/unsloth
facebookresearch/segment-anything-2
lllyasviel/IC-Light
gemelo-ai/vocos
comfyanonymous/ComfyUI
TransformerLensOrg/TransformerLens
HongwenZhang/PyMAF-X
roboflow/supervision
KwaiVGI/LivePortrait
piddnad/DDColor
TencentARC/InstantMesh
LAION-AI/aesthetic-predictor
Doubiiu/DynamiCrafter
facebookresearch/home-robot
KillianLucas/open-interpreter
jxnl/instructor

LIDA
Gaussian Splatting
Tune-A-Video
FollowYourPose
Text2Video-Zero
GLIP
UniFormerV2
SadTalker
OWL-ViT
VideoReTalking
LDM
Dream Fields
Detic
GraphCast
DragGAN
VRT
Thin-Plate Spline Motion Model
PyMAF-X
FateZero
py-irt
VQ-Diffusion
ECON

Research

name	description	authors	links	update
CoTracker	Architecture that jointly tracks multiple points throughout an entire video	Nikita Karaev Ignacio Rocco Benjamin Graham Natalia Neverova others Andrea Vedaldi Christian Rupprecht	, project	16.10.2024
PIFu	Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization	Ryota Natsume Shunsuke Saito Zeng Huang Angjoo Kanazawa Hao Li		08.10.2024
DifFace	Method that is capable of coping with unseen and complex degradations more gracefully without complicated loss designs	Zongsheng Yue Chen Change Loy	, , ,	05.10.2024
Segment Anything 2	Foundation model towards solving promptable visual segmentation in images and videos	Nikhila Ravi Valentin Gabeur Yuan-Ting Hu Ronghang Hu others Chaitanya Ryali Tengyu Ma Haitham Khedr Roman Rädle Chloe Rolland Laura Gustafson Eric Mintun Junting Pan [Kalyan Vasudev](lwala](https://scholar.google.co.in/citations?user=m34oaWEAAAAJ) Nicolas Carion [Chao-Yuan](u](https://chaoyuan.org/) Ross Girshick Piotr Dollár Christoph Feichtenhofer	demo , , project , , ,	01.10.2024
Open-Unmix	A deep neural network reference implementation for music source separation, applicable for researchers, audio engineers and artists	Fabian-Robert Stöter Antoine Liutkus	data project	25.09.2024
Deep Painterly Harmonization	Algorithm produces significantly better results than photo compositing or global stylization techniques and that it enables creative painterly edits that would be otherwise difficult to achieve	Fujun Luan Sylvain Paris Eli Shechtman Kavita Bala	, , ,	23.09.2024
audio2photoreal	Framework for generating full-bodied photorealistic avatars that gesture according to the conversational dynamics of a dyadic interaction	Evonne Ng Javier Romero Timur Bagautdinov Shaojie Bai others Trevor Darrell Angjoo Kanazawa Alexander Richard	project	13.09.2024
Fast Segment Anything	CNN Segment Anything Model trained using only 2% of the SA-1B dataset published by SAM authors	Xu Zhao Wenchao Ding Yongqi An Yinglong Du others Tao Yu Min Li Ming Tang Jinqiao Wang	, , ,	10.09.2024
Neuralangelo	Framework for high-fidelity 3D surface reconstruction from RGB video captures	Zhaoshuo Li Thomas Müller Alex Evans Russell Taylor others Mathias Unberath Ming-Yu Liu Chen-Hsuan Lin	blog post project , ,	02.09.2024
BiRefNet	Bilateral reference framework for high-resolution dichotomous image segmentation	Peng Zheng Dehong Gao Deng-Ping Fan Li Liu others Jorma Laaksonen Wanli Ouyang Nicu Sebe	, , , , project , ,	23.08.2024
SPIN	Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop	Nikos Kolotouros Georgios Pavlakos Michael Black Kostas Daniilidis	, project	21.08.2024
YOLOv10	Aim to further advance the performance-efficiency boundary of YOLOs from both the post-processing and model architecture	Ao Wang Hui Chen Kai Chen Zijia Lin others Jungong Han Guiguang Ding	blog post demo , , , , , , , , , , , , ,	20.08.2024
SpecVQGAN	Taming the visually guided sound generation by shrinking a training dataset to a set of representative vectors	Vladimir Iashin Esa Rahtu	, , , , , , , , project , ,	12.07.2024
LivePortrait	Video-driven portrait animation framework with a focus on better generalization, controllability, and efficiency for practical usage	Jianzhu Guo Dingyun Zhang Xiaoqiang Liu Zhizhou Zhong others Yuan Zhang Pengfei Wan Di Zhang	, , , , project , , , , ,	10.07.2024
TAPIR	Tracking Any Point with per-frame Initialization and temporal Refinement	Carl Doersch Yi Yang Mel Vecerik Dilara Gokay others Ankush Gupta Yusuf Aytar Joao Carreira Andrew Zisserman	, blog post, blog post ,	05.07.2024
Wav2Lip	A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild	Prajwal Renukanand Rudrabha Mukhopadhyay Vinay Namboodiri C. V. Jawahar	data demo project	27.06.2024
DeepLabCut	Efficient method for markerless pose estimation based on transfer learning with deep neural networks that achieves excellent results with minimal training data	Alexander Mathis Pranav Mamidanna Kevin Cury Taiga Abe others Venkatesh Murthy Mackenzie Mathis Matthias Bethge	, , , , , forum , website , ,	05.06.2024
PoolFormer	MetaFormer Is Actually What You Need for Vision	Weihao Yu Mi Luo Pan Zhou Chenyang Si others Yichen Zhou Xinchao Wang Jiashi Feng Shuicheng Yan	, ,	01.06.2024
StoryDiffusion	Way of self-attention calculation, termed Consistent Self-Attention, that significantly boosts the consistency between the generated images and augments prevalent pretrained diffusion-based text-to-image models in a zero-shot manner	Yupeng Zhou Daquan Zhou Ming-Ming Cheng Jiashi Feng Qibin Hou	project ,	04.05.2024
PuLID	Pure and Lightning ID customization, a tuning-free ID customization method for text-to-image generation	Zinan Guo Yanze Wu Zhuowei Chen Lang Chen Qian He	, ,	03.05.2024
FILM	A frame interpolation algorithm that synthesizes multiple intermediate frames from two input images with large in-between motion	Fitsum Reda Janne Kontkanen Eric Tabellion Deqing Sun others Caroline Pantofaru Brian Curless	data, data, data project , ,	03.05.2024
VoiceCraft	token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech on audiobooks, internet videos, and podcasts	Puyuan Peng Po-Yao Huang Shang-Wen Li Abdelrahman Mohamed David Harwath	project , ,	21.04.2024
ZeST	Method for zero-shot material transfer to an object in the input image given a material exemplar image	Ta-Ying Cheng Prafull Sharma Andrew Markham Niki Trigoni Varun Jampani	, project	16.04.2024
InstantMesh	Feed-forward framework for instant 3D mesh generation from a single image, featuring state-of-the-art generation quality and significant training scalability	Jiale Xu Weihao Cheng Yiming Gao Xintao Wang others Shenghua Gao Ying Shan	, ,	16.04.2024
AlphaFold	Highly accurate protein structure prediction	John Jumper Richard Evans Alexander Pritzel Tim Green others Michael Figurnov Olaf Ronneberger Kathryn Tunyasuvunakool Russ Bates Augustin Žídek Anna Potapenko Alex Bridgland Clemens Meyer Simon Kohl Andrew Ballard Bernardino Romera-Paredes Stanislav Nikolov Rishub Jain	blog post, blog post , paper ,	15.04.2024
Würstchen	Architecture for text-to-image synthesis that combines competitive performance with unprecedented cost-effectiveness for large-scale text-to-image diffusion models	Pablo Pernias Dominic Rampas Mats Richter Christopher Pal Marc Aubreville		06.04.2024
AQLM	Extreme Compression of Large Language Models via Additive Quantization	Vage Egiazarian Andrei Panferov Denis Kuznedelev Elias Frantar others Artem Babenko Dan Alistarh	, , ,	08.03.2024
YOLOv9	Learning What You Want to Learn Using Programmable Gradient Information	Chien-Yao Wang I-Hau Yeh Hong-Yuan Mark Liao	, blog post , , , , ,	05.03.2024
Multi-LoRA Composition	LoRA Switch and LoRA Composite, approaches that aim to surpass traditional techniques in terms of accuracy and image quality, especially in complex compositions	Ming Zhong Yelong Shen Shuohang Wang Yadong Lu others Yizhu Jiao Siru Ouyang Donghan Yu Jiawei Han Weizhu Chen	website	03.03.2024
AMARETTO	Multiscale and multimodal inference of regulatory networks to identify cell circuits and their drivers shared and distinct within and across biological systems of human disease	Nathalie Pochet Olivier Gevaert Mohsen Nabian Jayendra Shinde others Celine Everaert Thorin Tabor	bioconductor project	28.02.2024
LIDA	Tool for generating grammar-agnostic visualizations and infographics	Victor Dibia	, project , ,	06.02.2024
ViT	Vision Transformer and MLP-Mixer Architectures	Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn others Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly Jakob Uszkoreit Neil Houlsby	, , , , , , blog post , , , , ,	06.02.2024
3D Ken Burns	A reference implementation of 3D Ken Burns Effect from a Single Image using PyTorch - given a single input image, it animates this still image with a virtual camera scan and zoom subject to motion parallax	Manuel Romero		24.01.2024
VALL-E X	Cross-lingual neural codec language model for cross-lingual speech synthesis	Ziqiang Zhang Long Zhou Chengyi Wang Sanyuan Chen others Yu Wu Shujie Liu Zhuo Chen Yanqing Liu Huaming Wang Jinyu Li Lei He Sheng Zhao Furu Wei	, , demo project	19.01.2024
PhotoMaker	Efficient personalized text-to-image generation method, which mainly encodes an arbitrary number of input ID images into a stack ID embedding for preserving ID information	Zhen Li Mingdeng Cao Xintao Wang Zhongang Qi others Ming-Ming Cheng Ying Shan	, , , , , project ,	18.01.2024
DDColor	End-to-end method with dual decoders for image colorization	Xiaoyang Kang Tao Yang Wenqi Ouyang Peiran Ren others Lingzhi Li Xuansong Xie	,	15.01.2024
PASD	Pixel-aware stable diffusion network to achieve robust Real-ISR as well as personalized stylization	Tao Yang Peiran Ren Xuansong Xie Lei Zhang	,	12.01.2024
HandRefiner	Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting	Wenquan Lu Yufei Xu Jing Zhang Chaoyue Wang Dacheng Tao	, ,	08.01.2024
GraphCast	Learning skillful medium-range global weather forecasting	Rémi Lam Alvaro Sanchez-Gonzalez Matthew Willson Peter Wirnsberger others Meire Fortunato Ferran Alet Suman Ravuri Timo Ewalds Zach Eaton-Rosen Weihua Hu Alexander Merose Stephan Hoyer George Holland Oriol Vinyals Jacklynn Stott Alexander Pritzel Shakir Mohamed Peter Battaglia	data , , , , , , , ,	04.01.2024
ESM	Evolutionary Scale Modeling: Pretrained language models for proteins	Zeming Lin Roshan Rao Brian Hie Zhongkai Zhu others Allan dos Santos Costa Maryam Fazel-Zarandi Tom Sercu Salvatore Candido Alexander Rives Joshua Meier Robert Verkuil Jason Liu Chloe Hsu Adam Lerer	ESM Atlas FSDP ICML data paper, paper, paper, paper pubmed ,	28.12.2023
LLaVA	Large Language and Vision Assistant, an end-to-end trained large multimodal model that connects a vision encoder and LLM for general-purpose visual and language understanding	Haotian Liu Chunyuan Li Qingyang Wu Yong Jae Lee Yuheng Li	, , , , demo , , , , , , project , , , , ,	22.12.2023
Background Matting V2	Real-time, high-resolution background replacement technique which operates at 30fps in 4K resolution, and 60fps for HD on a modern GPU	Shanchuan Lin Andrey Ryabtsev Soumyadip Sengupta Brian Curless others Steve Seitz Ira Kemelmacher-Shlizerman	, project ,	22.12.2023
Gaussian Splatting	State-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (≥ 100 fps) novel-view synthesis at 1080p resolution	Bernhard Kerbl Georgios Kopanas Thomas Leimkühler George Drettakis	project , , , , , ,	19.12.2023
SMPLer-X	Scaling up EHPS towards the first generalist foundation model, with up to ViT-Huge as the backbone and training with up to 4.5M instances from diverse data sources	Zhongang Cai Wanqi Yin Ailing Zeng Chen Wei others Qingping Sun Yanjun Wang Hui En Pang Haiyi Mei Mingyuan Zhang Lei Zhang Chen Change Loy Lei Yang Ziwei Liu	, , project ,	18.12.2023
DeepCache	Training-free paradigm that accelerates diffusion models from the perspective of model architecture	Xinyin Ma Gongfan Fang Xinchao Wang	project	18.12.2023
MagicAnimate	Diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity	Zhongcong Xu Jianfeng Zhang Jun Hao Liew Hanshu Yan others Jiawei Liu Chenxu Zhang Jiashi Feng Mike Shou	, , project website , ,	18.12.2023
DiffBIR	Towards Blind Image Restoration with Generative Diffusion Prior	Xinqi Lin Jingwen He Ziyan Chen Zhaoyang Lyu others Ben Fei Bo Dai Wanli Ouyang Yu Qiao Chao Dong	project ,	18.12.2023
AudioLDM	Text-to-audio system that is built on a latent space to learn the continuous audio representations from contrastive language-audio pretraining latents	Haohe Liu Zehua Chen Yi Yuan Xinhao Mei others Xubo Liu Danilo Mandic Wenwu Wang Mark Plumbley	, , project	02.12.2023
TabPFN	Neural network that learned to do tabular data prediction	Noah Hollmann Samuel Müller Katharina Eggensperger Frank Hutter	, , , , , blog post	29.11.2023
Concept Sliders	Plug-and-play low rank adaptors applied on top of pretrained models	Rohit Gandikota Joanna Materzyńska Tingrui Zhou Antonio Torralba David Bau	, project	26.11.2023
Qwen-VL	Set of large-scale vision-language models designed to perceive and understand both text and images	Jinze Bai Shuai Bai Shusheng Yang Shijie Wang others Sinan Tan Peng Wang Junyang Lin Chang Zhou Jingren Zhou	, , demo , , , , ,	24.11.2023
AnimeGANv3	Double-tail generative adversarial network for fast photo animation	Gang Liu Xin Chen	project , , , , ,	23.11.2023
Ithaca	First Deep Neural Network for the textual restoration, geographical and chronological attribution of ancient Greek inscriptions	Yannis Assael Thea Sommerschield Brendan Shillingford Mahyar Bordbar others John Pavlopoulos Marita Chatzipanagiotou Ion Androutsopoulos Jonathan Prag Nando de Freitas	, project	21.11.2023
PixArt-Σ	Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation	Junsong Chen Chongjian Ge Enze Xie Yue Wu others Lewei Yao Xiaozhe Ren Zhongdao Wang Ping Luo Huchuan Lu Zhenguo Li	, , , project	07.11.2023
Zero123++	Image-conditioned diffusion model for generating 3D-consistent multi-view images from a single input view	Ruoxi Shi Hansheng Chen Zhuoyang Zhang Minghua Liu others Chao Xu Xinyue Wei Linghao Chen Chong Zeng Hao Su	, ,	26.10.2023
UniFormerV2	Unified Transformer for Efficient Spatiotemporal Representation Learning	Kunchang Li Yali Wang Yinan He Yizhuo Li others Yi Wang Limin Wang Yu Qiao	, , , , , , ,	20.10.2023
Show-1	Hybrid model, dubbed as Show-1, which marries pixel-based and latent-based VDMs for text-to-video generation	David Junhao Zhang Jay Zhangjie Wu Jiawei Liu Rui Zhao others Lingmin Ran Yuchao Gu Difei Gao Mike Zheng Shou	, , , , , project	15.10.2023
AudioSep	Foundation model for open-domain audio source separation with natural language queries	Xubo Liu Qiuqiang Kong Yan Zhao Haohe Liu others Yi Yuan Yuzhuo Liu Rui Xia Yuxuan Wang Mark Plumbley Wenwu Wang	project	12.10.2023
DA-CLIP	Degradation-aware vision-language model to better transfer pretrained vision-language models to low-level vision tasks as a universal framework for image restoration	Ziwei Luo Fredrik Gustafsson Zheng Zhao Jens Sjölund Thomas Schön	project	11.10.2023
SadTalker	Generates 3D motion coefficients of the 3DMM from audio and implicitly modulates a novel 3D-aware face render for talking head generation	Wenxuan Zhang Xiaodong Cun Xuan Wang Yong Zhang others Xi Shen Yu Guo Ying Shan Fei Wang	, , , , , , , project , , ,	10.10.2023
Musika	Music generation system that can be trained on hundreds of hours of music using a single consumer GPU, and that allows for much faster than real-time generation of music of arbitrary length on a consumer CPU	Marco Pasini Jan Schlüter	, data , project ,	09.10.2023
YOLOv6	Single-stage object detection framework dedicated to industrial applications	Kaiheng Weng Meng Cheng Yiduo Li Xiangxiang Chu Xiaolin Wei	, blog post data , , , , , ,	08.10.2023
DreamGaussian	Algorithm to convert 3D Gaussians into textured meshes and apply a fine-tuning stage to refine the details	Jiaxiang Tang Jiawei Ren Hang Zhou Ziwei Liu Gang Zeng	, , project	04.10.2023
ICON	Given a set of images, method estimates a detailed 3D surface from each image and then combines these into an animatable avatar	Yuliang Xiu Jinlong Yang Dimitrios Tzionas Michael Black	, , , , , , , project	31.08.2023
DINOv2	Produce high-performance visual features that can be directly employed with classifiers as simple as linear layers on a variety of computer vision tasks; these visual features are robust and perform well across domains without any requirement for fine-tuning	Maxime Oquab Timothée Darcet Théo Moutakanni Huy Vo others Marc Szafraniec Vasil Khalidov Pierre Fernandez Daniel Haziza Francisco Massa Alaaeldin El-Nouby Mahmoud Assran Nicolas Ballas Wojciech Galuba Russell Howes Po-Yao Huang Shang-Wen Li Ishan Misra Michael Rabbat Vasu Sharma Gabriel Synnaeve Hu Xu Hervé Jegou Julien Mairal Patrick Labatut Armand Joulin Piotr Bojanowski	blog post demo , , ,	31.08.2023
OWL-ViT	Simple Open-Vocabulary Object Detection with Vision Transformers	Matthias Minderer Alexey Gritsenko Austin Stone Maxim Neumann others Dirk Weissenborn Alexey Dosovitskiy Aravindh Mahendran Anurag Arnab Mostafa Dehghani Zhuoran Shen Xiao Wang Xiaohua Zhai Thomas Kipf Neil Houlsby		21.08.2023
StyleGAN3	Alias-Free Generative Adversarial Networks	Tero Karras Miika Aittala Samuli Laine Erik Härkönen others Janne Hellsten Jaakko Lehtinen Timo Aila	, , , , , , , , , project	13.08.2023
FateZero	Zero-shot text-based editing method on real-world videos without per-prompt training or use-specific mask	Chenyang Qi Xiaodong Cun Yong Zhang Chenyang Lei others Xintao Wang Ying Shan Qifeng Chen	, , project video	13.08.2023
Big GAN	Large Scale GAN Training for High Fidelity Natural Image Synthesis	Andrew Brock Jeff Donahue Karen Simonyan		03.08.2023
LaMa	Resolution-robust Large Mask Inpainting with Fourier Convolutions	Roman Suvorov Elizaveta Logacheva Anton Mashikhin Anastasia Remizova others Arsenii Ashukha Aleksei Silvestrov Naejin Kong Harshith Goka Kiwoong Park Victor Lempitsky	, , , project	02.08.2023
MakeItTalk	A method that generates expressive talking-head videos from a single facial image with audio as the only input	Yang Zhou Xintong Han Eli Shechtman Jose Echevarria others Evangelos Kalogerakis Dingzeyu Li	data project	27.07.2023
HiDT	A generative image-to-image model and a new upsampling scheme that allows to apply image translation at high resolution	Denis Korzhenkov Gleb Sterkin Sergey Nikolenko Victor Lempitsky	project ,	24.07.2023
CutLER	Simple approach for training unsupervised object detection and segmentation models	Xudong Wang Rohit Girdhar Stella Yu Ishan Misra	, project	24.07.2023
Recognize Anything & Tag2Text	Vision language pre-training framework, which introduces image tagging into vision-language models to guide the learning of visual-linguistic features	Xinyu Huang Youcai Zhang Jinyu Ma Zhaoyang Li others Yanchun Xie Yuzhuo Qin Tong Luo Yaqian Li Yandong Guo Yandong Guo Lei Zhang	, , project, project	09.07.2023
Thin-Plate Spline Motion Model	End-to-end unsupervised motion transfer framework	Jian Zhao Hui Zhang	, , , supp	07.07.2023
DragGAN	Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold	Xingang Pan Ayush Tewari Thomas Leimkühler Lingjie Liu others Abhimitra Meka Christian Theobalt	project	03.07.2023
MobileSAM	Towards Lightweight SAM for Mobile Applications	Chaoning Zhang Dongshen Han Yu Qiao Jung Uk Kim others Sung-Ho Bae Seungkyu Lee Choong Seon Hong	, , , , , , ,	30.06.2023
Grounding DINO	Marrying DINO with Grounded Pre-Training for Open-Set Object Detection	Shilong Liu Zhaoyang Zeng Tianhe Ren Feng Li others Hao Zhang Jie Yang Chunyuan Li Jianwei Yang Hang Su Jun Zhu Lei Zhang	, , , , , , , , , , , ,	28.06.2023
T5X	Modular, composable, research-friendly framework for high-performance, configurable, self-service training, evaluation, and inference of sequence models at many scales	Adam Roberts Hyung Won Chung Anselm Levskaya Gaurav Mishra others James Bradbury Daniel Andor Sharan Narang Brian Lester Colin Gaffney Afroz Mohiuddin Curtis Hawthorne Aitor Lewkowycz Alex Salcianu Marc van Zee Jacob Austin Sebastian Goodman Livio Baldini Soares Haitang Hu Sasha Tsvyashchenko Aakanksha Chowdhery Jasmijn Bastings Jannis Bulian Xavier Garcia Jianmo Ni Kathleen Kenealy Jonathan Clark Dan Garrette James Lee-Thorp Colin Raffel Noam Shazeer Marvin Ritter Maarten Bosma Alexandre Passos Jeremy Maitin-Shepard Noah Fiedel Brennan Saeta Ryan Sepassi Alexander Spiridonov Joshua Newlan Andrea Gesmundo	, , , ,	27.06.2023
CodeTalker	Cast speech-driven facial animation as a code query task in a finite proxy space of the learned codebook, which effectively promotes the vividness of the generated motions by reducing the cross-modal mapping uncertainty	[Jinbo Xing](Jinbo Xing) Menghan Xia Yuechen Zhang Xiaodong Cun others Jue Wang Tien-Tsin Wong	[](), [](), [](), [](), [](), [](), [](), [](), [](), []() , , , , , , project	16.06.2023
First Order Motion Model for Image Animation	Transferring facial movements from video to image	Aliaksandr Siarohin	project	04.06.2023
Parallel WaveGAN	State-of-the-art non-autoregressive models to build your own great vocoder	Tomoki Hayashi	, , demo ,	01.06.2023
ECON	designed for "Human digitization from a color image", which combines the best properties of implicit and explicit representations, to infer high-fidelity 3D clothed humans from in-the-wild images, even with loose clothing or in challenging poses	Yuliang Xiu Jinlong Yang Xu Cao Dimitrios Tzionas Michael Black	, , , , , , , , ,	31.05.2023
MMS	The Massively Multilingual Speech project expands speech technology from about 100 languages to over 1000 by building a single multilingual speech recognition model supporting over 1100 languages, language identification models able to identify over 4000 languages, pretrained models supporting over 1400 languages, and text-to-speech models for over 1100 languages	Vineel Pratap Andros Tjandra Bowen Shi Paden Tomasello others Arun Babu Sayani Kundu Ali Elkahky Zhaoheng Ni Apoorv Vyas Maryam Fazel-Zarandi Alexei Baevski Yossi Adi Xiaohui Zhang Wei-Ning Hsu Alexis Conneau Michael Auli	, , ,	26.05.2023
FAB	Flow AIS Bootstrap uses AIS to generate samples in regions where the flow is a poor approximation of the target, facilitating the discovery of new modes	Laurence Midgley Vincent Stimper Gregor N. C. Simm Bernhard Schölkopf José Miguel Hernández-Lobato	,	29.04.2023
CodeFormer	Transformer-based prediction network to model global composition and context of the low-quality faces for code prediction, enabling the discovery of natural faces that closely approximate the target faces even when the inputs are severely degraded	Shangchen Zhou Kelvin Chan Chongyi Li Chen Change Loy	, , project , , ,	21.04.2023
Text2Video-Zero	Text-to-Image Diffusion Models are Zero-Shot Video Generators	Levon Khachatryan Andranik Movsisyan Vahram Tadevosyan Roberto Henschel others Zhangyang Wang Shant Navasardyan Humphrey Shi	, , , , , project video ,	11.04.2023
Segment Anything	The Segment Anything Model produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image	Alexander Kirillov Eric Mintun Nikhila Ravi Hanzi Mao others Chloé Rolland Laura Gustafson Tete Xiao Spencer Whitehead Alex Berg Wan-Yen Lo Piotr Dollár Ross Girshick	data , website , ,	10.04.2023
FollowYourPose	Two-stage training scheme that can utilize image pose pair and pose-free video datasets and the pre-trained text-to-image model to obtain the pose-controllable character videos	Yue Ma Yingqing He Xiaodong Cun Xintao Wang others Siran Chen Ying Shan Xiu Li Qifeng Chen	, , , project video	07.04.2023
EVA3D	High-quality unconditional 3D human generative model that only requires 2D image collections for training	Fangzhou Hong Zhaoxi Chen Yushi Lan Liang Pan Ziwei Liu	project ,	06.04.2023
Stable Dreamfusion	Using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis	Jiaxiang Tang Ben Poole Ajay Jain Jon Barron Ben Mildenhall	, project , , ,	04.04.2023
PIFuHD	Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization	Shunsuke Saito Tomas Simon Jason Saragih Hanbyul Joo	,	26.03.2023
VideoReTalking	System to edit the faces of a real-world talking head video according to input audio, producing a high-quality and lip-syncing output video even with a different emotion	Kun Cheng Xiaodong Cun Yong Zhang Menghan Xia others Fei Yin Mingrui Zhu Xuan Wang Jue Wang Nannan Wang	, , , , project , ,	19.03.2023
Visual ChatGPT	Connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting	Chenfei Wu Shengming Yin Weizhen Qi Xiaodong Wang others Zecheng Tang Nan Duan	, , , ,	15.03.2023
Tune-A-Video	One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation	Jay Zhangjie Wu Yixiao Ge Xintao Wang Stan Weixian Lei others Yuchao Gu Yufei Shi Wynne Hsu Ying Shan Xiaohu Qie Mike Zheng Shou	, , , project ,	23.02.2023
GPEN	GAN Prior Embedded Network for Blind Face Restoration in the Wild	Tao Yang Peiran Ren Xuansong Xie Lei Zhang	demo ,	15.02.2023
PyMAF-X	Кegression-based approach to recovering parametric full-body models from monocular images	Hongwen Zhang Yating Tian Yuxiang Zhang Mengcheng Li others Liang An Zhenan Sun Yebin Liu	, , , , project	14.02.2023
Disco Diffusion	A frankensteinian amalgamation of notebooks, models and techniques for the generation of AI Art and Animations	Max Ingham Adam Letts Daniel Russell Chigozie Nri	, ,	11.02.2023
GrooVAE	Some applications of machine learning for generating and manipulating beats and drum performances	Jon Gillick Adam Roberts Jesse Engel	blog post data web app	02.02.2023
Multitrack MusicVAE	The models in this notebook are capable of encoding and decoding single measures of up to 8 tracks, optionally conditioned on an underlying chord	Ian Simon Adam Roberts Colin Raffel Jesse Engel others Curtis Hawthorne Douglas Eck	blog post	02.02.2023
MusicVAE	A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music	Adam Roberts Jesse Engel Colin Raffel Curtis Hawthorne Douglas Eck	blog post project	02.02.2023
Learning to Paint	Learning to Paint With Model-based Deep Reinforcement Learning	Manuel Romero		01.02.2023
Instant-NGP	Instant Neural Graphics Primitives with a Multiresolution Hash Encoding	Thomas Müller Alex Evans Christoph Schied Alexander Keller	blog post , , , , project tutorial , , ,	18.01.2023
Fourier Feature Networks	Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains	Matthew Tancik Pratul Srinivasan Ben Mildenhall Sara Fridovich-Keil others Nithin Raghavan Utkarsh Singhal Ravi Ramamoorthi Jon Barron Ren Ng	, project	17.01.2023
AlphaPose	Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time	Hao-Shu Fang Jiefeng Li Hongyang Tang Chao Xu others Haoyi Zhu Yuliang Xiu Yong-Lu Li Cewu Lu	, project , , ,	07.01.2023
HybrIK	Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation	Jiefeng Li Chao Xu Zhicun Chen Siyuan Bian others Lixin Yang Cewu Lu	project supp	01.01.2023
Score Jacobian Chaining	Apply chain rule on the learned gradients, and back-propagate the score of a diffusion model through the Jacobian of a differentiable renderer, which we instantiate to be a voxel radiance field	Haochen Wang Xiaodan Du Jiahao Li Raymond Yeh Greg Shakhnarovich	, project ,	05.12.2022
Demucs	Hybrid Spectrogram and Waveform Source Separation	Alexandre Défossez	, , , , , ,	21.11.2022
StyleCLIP	Text-Driven Manipulation of StyleGAN Imager	Or Patashnik Zongze Wu Eli Shechtman Daniel Cohen-Or Dani Lischinski	, , , ,	30.10.2022
MotionDiffuse	The first diffusion model-based text-driven motion generation framework, which demonstrates several desired properties over existing methods	Mingyuan Zhang Zhongang Cai Liang Pan Fangzhou Hong others Xinying Guo Lei Yang Ziwei Liu	project	13.10.2022
VToonify	Leverages the mid- and high-resolution layers of StyleGAN to render high-quality artistic portraits based on the multi-scale content features extracted by an encoder to better preserve the frame details	Shuai Yang Liming Jiang Ziwei Liu Chen Change Loy	, , , , project	07.10.2022
PyMAF	Pyramidal Mesh Alignment Feedback loop in regression network for well-aligned body mesh recovery and extend it for the recovery of expressive full-body models	Hongwen Zhang Yating Tian Yuxiang Zhang Mengcheng Li others Liang An Zhenan Sun Yebin Liu	, , , , project ,	06.10.2022
AlphaTensor	Discovering faster matrix multiplication algorithms with reinforcement learning	Alhussein Fawzi Matej Balog Aja Huang Thomas Hubert others Bernardino Romera-Paredes Mohammadamin Barekatain Alexander Novikov Francisco Ruiz Julian Schrittwieser Grzegorz Swirszcz David Silver Demis Hassabis Pushmeet Kohli	, , ,	04.10.2022
Swin2SR	Novel Swin Transformer V2, to improve SwinIR for image super-resolution, and in particular, the compressed input scenario	Marcos Conde Ui-Jin Choi Maxime Burchi Radu Timofte	, , , , , , ,	03.10.2022
Functa	From data to functa: Your data point is a function and you can treat it like one	Emilien Dupont Hyunjik Kim Ali Eslami Danilo Rezende Dan Rosenbaum	,	24.09.2022
Whisper	Automatic speech recognition system trained on 680,000 hours of multilingual and multitask supervised data collected from the web	Alec Radford Jong Wook Kim Tao Xu Greg Brockman others Christine McLeavey Ilya Sutskever	blog post , ,	21.09.2022
DeOldify (video)	Colorize your own videos!	Jason Antic	, model , website ,	19.09.2022
DeOldify (photo)	Colorize your own photos!	Jason Antic Matt Robinson María Benavente	, model website	19.09.2022
Real-ESRGAN	Extend the powerful ESRGAN to a practical restoration application, which is trained with pure synthetic data	Xintao Wang Liangbin Xie Chao Dong Ying Shan	, , , ,	18.09.2022
IDE-3D	Interactive Disentangled Editing for High-Resolution 3D-aware Portrait Synthesis	Jingxiang Sun Xuan Wang Yichun Shi Lizhen Wang others Jue Wang Yebin Liu	, , ,	08.09.2022
Decision Transformers	An architecture that casts the problem of RL as conditional sequence modeling	Lili Chen Kevin Lu Aravind Rajeswaran Kimin Lee others Aditya Grover Michael Laskin Pieter Abbeel Aravind Srinivas Igor Mordatch	, , project , , ,	06.09.2022
textual-inversion	An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion	Rinon Gal Yuval Alaluf Yuval Atzmon Or Patashnik others Amit Bermano Gal Chechik Daniel Cohen-Or	project ,	21.08.2022
StyleGAN-Human	A Data-Centric Odyssey of Human Generation	Jianglin Fu Shikai Li Yuming Jiang Kwan-Yee Lin others Chen Qian Chen Change Loy Wayne Wu Ziwei Liu	, , project , , ,	19.08.2022
Make-A-Scene	Scene-Based Text-to-Image Generation with Human Priors	Oran Gafni Adam Polyak Oron Ashual Shelly Sheynin others Devi Parikh Yaniv Taigman		12.08.2022
StyleGAN-NADA	Zero-Shot non-adversarial domain adaptation of pre-trained generators	Rinon Gal Or Patashnik Haggai Maron Gal Chechik Daniel Cohen-Or	, , , project	09.08.2022
YOLOv7	Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors	Chien-Yao Wang Alexey Bochkovskiy Mark Liao	data, data, data, data , , , , , , ,	09.08.2022
GLIP	Grounded language-image pre-training model for learning object-level, language-aware, and semantic-rich visual representations	Liunian Harold Li Pengchuan Zhang Haotian Zhang Jianwei Yang others Chunyuan Li Yiwu Zhong Lijuan Wang Lu Yuan Lei Zhang Jenq-Neng Hwang Kai-Wei Chang Jianfeng Gao	, , , blog post ,	30.07.2022
Anycost GAN	Interactive natural image editing	Ji Lin Richard Zhang Frieder Ganz Song Han Jun-Yan Zhu	, , , , project	20.07.2022
GFPGAN	Towards Real-World Blind Face Restoration with Generative Facial Prior	Xintao Wang Yu Li Honglun Zhang Ying Shan	, , project	13.07.2022
EPro-PnP	Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation	Hansheng Chen Pichao Wang Fan Wang Wei Tian others Lu Xiong Hao Li	, , , , nuScenes	12.07.2022
Text2Human	Text-driven controllable framework for a high-quality and diverse human generation	Yuming Jiang Shuai Yang Haonan Qiu Wayne Wu others Chen Change Loy Ziwei Liu	, , project ,	04.07.2022
VQ-Diffusion	Based on a VQ-VAE whose latent space is modeled by a conditional variant of the recently developed Denoising Diffusion Probabilistic Model	Shuyang Gu Dong Chen Jianmin Bao Fang Wen others Bo Zhang Dongdong Chen Lu Yuan Baining Guo Shuyang Gu Zhicong Tang	, ,	30.06.2022
OPT	Open Pre-trained Transformers is a family of NLP models trained on billions of tokens of text obtained from the internet	Susan Zhang Stephen Roller Naman Goyal Mikel Artetxe others Moya Chen Christopher Dewan Mona Diab Xi Victoria Lin Todor Mihaylov Myle Ott Sam Shleifer Kurt Shuster Daniel Simig Punit Singh Koura Anjali Sridhar Tianlu Wang Luke Zettlemoyer	, , , blog post	29.06.2022
Customizing a Transformer Encoder	We will learn how to customize the encoder to employ new network architectures	Chen Chen		22.06.2022
MTTR	End-to-End Referring Video Object Segmentation with Multimodal Transformers	Adam Botach Evgenii Zheltonozhskii Chaim Baskin	, ,	20.06.2022
SwinIR	Image Restoration Using Swin Transformer	Jingyun Liang Jiezhang Cao Guolei Sun Kai Zhang others Luc Van Gool Radu Timofte	, , ,	17.06.2022
VRT	A Video Restoration Transformer	Jingyun Liang Jiezhang Cao Yuchen Fan Kai Zhang others Yawei Li Radu Timofte Luc Van Gool	, ,	15.06.2022
Omnivore	A single model which excels at classifying images, videos, and single-view 3D data using exactly the same model parameters	Rohit Girdhar Mannat Singh Nikhila Ravi Laurens Maaten others Armand Joulin Ishan Misra	, project	14.06.2022
Dream Fields	Zero-Shot Text-Guided Object Generation	Ajay Jain Ben Mildenhall Jon Barron Pieter Abbeel Ben Poole	, , , project	10.06.2022
Detic	Detecting Twenty-thousand Classes using Image-level Supervision	Xingyi Zhou Rohit Girdhar Armand Joulin Philipp Krähenbühl Ishan Misra		07.06.2022
T0	Multitask Prompted Training Enables Zero-Shot Task Generalization	Victor Sanh Albert Webson Colin Raffel Stephen Bach others Lintang Sutawika Zaid Alyafeai Antoine Chaffin Arnaud Stiegler Teven Scao Arun Raja Manan Dey M Saiful Bari Canwen Xu Urmish Thakker Shanya Sharma Eliza Szczechla Taewoon Kim Gunjan Chhablani Nihal Nayak Debajyoti Datta Jonathan Chang Mike Tian-Jian Jiang Matteo Manica Sheng Shen Zheng Xin Yong Harshit Pandey Rachel Bawden Trishala Neeraj Jos Rozen Abheesht Sharma Andrea Santilli Thibault Fevry Jason Alan Fries Ryan Teehan Stella Biderman Leo Gao Tali Bers Thomas Wolf Alexander M. Rush	,	29.05.2022
AvatarCLIP	A zero-shot text-driven framework for 3D avatar generation and animation	Fangzhou Hong Mingyuan Zhang Liang Pan Zhongang Cai others Lei Yang Ziwei Liu	, , , , data , , , , project	15.05.2022
Text2Mesh	Text-Driven Neural Stylization for Meshes	Oscar Michel Roi Bar-On Richard Liu Sagie Benaim Rana Hanocka	CLIP project	14.05.2022
T5	Text-To-Text Transfer Transformer	Colin Raffel Noam Shazeer Adam Roberts Katherine Lee others Sharan Narang Michael Matena Yanqi Zhou Wei Li Peter J. Liu		11.05.2022
XLS-R	Self-supervised Cross-lingual Speech Representation Learning at Scale	Arun Babu Changhan Wang Andros Tjandra Kushal Lakhotia others Qiantong Xu Naman Goyal Kritika Singh Patrick von Platen Yatharth Saraf Juan Pino Alexei Baevski Alexis Conneau Michael Auli	blog post	10.05.2022
DiffCSE	Unsupervised contrastive learning framework for learning sentence embeddings	Yung-Sung Chuang Rumen Dangovski Hongyin Luo Yang Zhang others Shiyu Chang Marin Soljačić Shang-Wen Li Scott Wen-tau Yih Yoon Kim James Glass	, ,	24.04.2022
ViDT+	An Extendable, Efficient and Effective Transformer-based Object Detector	Hwanjun Song Deqing Sun Sanghyuk Chun Varun Jampani others Dongyoon Han Byeongho Heo Wonjae Kim Ming-Hsuan Yang	, ,	20.04.2022
BasicVSR++	Redesign BasicVSR by proposing second-order grid propagation and flow-guided deformable alignment	Kelvin Chan Shangchen Zhou Xiangyu Xu Chen Change Loy	, project	18.04.2022
NAFNet	Nonlinear Activation Free Network for Image Restoration	Liangyu Chen Xiaojie Chu Xiangyu Zhang Jian Sun	, ,	15.04.2022
Panini-Net	GAN Prior based Degradation-Aware Feature Interpolation for Face Restoration	Yinhuai Wang Yujie Hu Jian Zhang	,	13.04.2022
E2FGVI	An End-to-End framework for Flow-Guided Video Inpainting through elaborately designed three trainable modules, namely, flow completion, feature propagation, and content hallucination modules	Zhen Li Cheng-Ze Lu Jianhua Qin Chun-Le Guo Ming-Ming Cheng	data, data , , , ,	06.04.2022
LDM	High-Resolution Image Synthesis with Latent Diffusion Models	Robin Rombach Andreas Blattmann Dominik Lorenz Patrick Esser Björn Ommer	, , , , ,	04.04.2022
GP-UNIT	Novel framework, Generative Prior-guided UNsupervised Image-to-image Translation, to improve the overall quality and applicability of the translation algorithm	Shuai Yang Liming Jiang Ziwei Liu Chen Change Loy	ImageNet , , , , , , project	02.04.2022
DualStyleGAN	More challenging exemplar-based high-resolution portrait style transfer by introducing a novel DualStyleGAN with flexible control of dual styles of the original face domain and the extended artistic portrait domain	Shuai Yang Liming Jiang Ziwei Liu Chen Change Loy	data, data , , , project	24.03.2022
CLIPasso	Semantically-Aware Object Sketching	Yael Vinker Ehsan Pajouheshgar Jessica Y. Bo Roman Bachmann others Amit Bermano Daniel Cohen-Or Amir Zamir Ariel Shamir	, demo project	21.03.2022
StyleSDF	A high resolution, 3D-consistent image and shape generation technique	Roy Or-El Xuan Luo Mengyi Shan Eli Shechtman others Jeong Joon Park Ira Kemelmacher-Shlizerman	, project	05.03.2022
Disentangled Lifespan Face Synthesis	LFS model is proposed to disentangle the key face characteristics including shape, texture and identity so that the unique shape and texture age transformations can be modeled effectively	Sen He Wentong Liao Michael Yang Yi-Zhe Song others Bodo Rosenhahn Tao Xiang	project	22.02.2022
ClipCap	CLIP Prefix for Image Captioning	Ron Mokady Amir Hertz Amit Bermano	data	15.02.2022
ROMP	Monocular, One-stage, Regression of Multiple 3D People	Yu Sun Qian Bao Wu Liu Yili Fu others Michael Black Tao Mei	, , , , , ,	11.02.2022
Mask2Former	Masked-attention Mask Transformer for Universal Image Segmentation	Bowen Cheng Ishan Misra Alexander Schwing Alexander Kirillov Rohit Girdhar	, demo project	09.02.2022
JoJoGAN	One Shot Face Stylization	Min Jin Chong David Forsyth	,	02.02.2022
Pose with Style	Detail-Preserving Pose-Guided Image Synthesis with Conditional StyleGAN	Badour AlBahar Jingwan Lu Jimei Yang Zhixin Shu others Eli Shechtman Jia-Bin Huang	project	19.01.2022
ConvNeXt	A pure ConvNet model constructed entirely from standard ConvNet modules	Zhuang Liu Hanzi Mao Chao-Yuan Wu Christoph Feichtenhofer others Trevor Darrell Saining Xie	, , , ,	19.01.2022
diffsort	Differentiable Sorting Networks	Felix Petersen Christian Borgelt Hilde Kuehne Oliver Deussen	,	17.01.2022
Taming Transformers for High-Resolution Image Synthesis	We combine the efficiancy of convolutional approaches with the expressivity of transformers by introducing a convolutional VQGAN, which learns a codebook of context-rich visual parts, whose composition is modeled with an autoregressive transformer	Patrick Esser Robin Rombach Björn Ommer	project	13.01.2022
RealBasicVSR	Investigating Tradeoffs in Real-World Video Super-Resolution	Kelvin Chan Shangchen Zhou Xiangyu Xu Chen Change Loy		25.12.2021
GLIDE	Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models	Alex Nichol Prafulla Dhariwal Aditya Ramesh Pranav Shyam others Pamela Mishkin Bob McGrew Ilya Sutskever Mark Chen		22.12.2021
Nerfies	First method capable of photorealistically reconstructing deformable scenes using photos/videos captured casually from mobile phones	Keunhong Park Utkarsh Sinha Jon Barron Sofien Bouaziz others Dan Goldman Steve Seitz Ricardo Martin-Brualla	project ,	06.12.2021
HyperStyle	A hypernetwork that learns to modulate StyleGAN's weights to faithfully express a given image in editable regions of the latent space	Yuval Alaluf Omer Tov Ron Mokady Rinon Gal Amit Bermano	, , , data , , , , , , , project	03.12.2021
encoder4editing	Designing an Encoder for StyleGAN Image Manipulation	Omer Tov Yuval Alaluf Yotam Nitzan Or Patashnik Daniel Cohen-Or		02.12.2021
StyleCariGAN	Caricature Generation via StyleGAN Feature Map Modulation	Wonjong Jang Gwangjin Ju Yucheol Jung Jiaolong Yang others Xin Tong Seungyong Lee	, project	30.11.2021
CartoonGAN	The implementation of the cartoon GAN model with PyTorch	Tobias Sunderdiek	project	24.11.2021
SimSwap	An efficient framework, called Simple Swap, aiming for generalized and high fidelity face swapping	Xuanhong Chen Bingbing Ni Yanhao Ge		24.11.2021
RVM	Robust High-Resolution Video Matting with Temporal Guidance	Shanchuan Lin Linjie Yang Imran Saleemi Soumyadip Sengupta	, project ,	24.11.2021
RVM	Robust, real-time, high-resolution human video matting method that achieves new state-of-the-art performance	Shanchuan Lin Linjie Yang Imran Saleemi Soumyadip Sengupta	project , , , ,	24.11.2021
AnimeGANv2	An improved version of AnimeGAN - it prevents the generation of high-frequency artifacts by simply changing the normalization of features in the network	Xin Chen Gang Liu bryandlee	, project	17.11.2021
SOAT	StyleGAN of All Trades: Image Manipulation with Only Pretrained StyleGAN	Min Jin Chong Hsin-Ying Lee David Forsyth	,	13.11.2021
Arnheim	Generative Art Using Neural Visual Grammars and Dual Encoders	Chrisantha Fernando Ali Eslami Jean-Baptiste Alayrac Piotr Mirowski others Dylan Banarse Simon Osindero	, , , , , , ,	11.11.2021
StyleGAN 2	Generation of faces, cars, etc.	Mikael Christensen		05.11.2021
ByteTrack	Multi-Object Tracking by Associating Every Detection Box	Yifu Zhang Peize Sun Yi Jiang Dongdong Yu others Ping Luo Xinggang Wang	data, data , , ,	30.10.2021
GPT-2	Retrain an advanced text generating neural network on any text dataset using gpt-2-simple!	Max Woolf	blog post, blog post	18.10.2021
ConvMixer	An extremely simple model that is similar in spirit to the ViT and the even-more-basic MLP-Mixer in that it operates directly on patches as input, separates the mixing of spatial and channel dimensions, and maintains equal size and resolution throughout the network	Asher Trockman Zico Kolter	,	06.10.2021
IC-GAN	Instance-Conditioned GAN	Arantxa Casanova Marlène Careil Jakob Verbeek Michał Drożdżal Adriana Romero-Soriano	blog post , , , ,	01.10.2021
Skillful Precipitation Nowcasting Using Deep Generative Models of Radar	Open-sourced dataset and model snapshot for precipitation nowcasting	Suman Ravuri Karel Lenc Matthew Willson Dmitry Kangin others Rémi Lam Piotr Mirowski Maria Athanassiadou Sheleem Kashem Rachel Prudden Amol Mandhane Aidan Clark Andrew Brock Karen Simonyan Raia Hadsell Niall Robinson Ellen Clancy Shakir Mohamed	blog post local kernel	29.09.2021
Live Speech Portraits	Real-Time Photorealistic Talking-Head Animation	Yuanxun Lu Jinxiang Chai Xun Cao	, , , project	26.09.2021
StylEx	Training a GAN to explain a classifier in StyleSpace	Oran Lang Yossi Gandelsman Michal Yarom Yoav Wald others Gal Elidan Avinatan Hassidim William Freeman Phillip Isola Amir Globerso Michal Irani Inbar Mosseri	, , , , blog post project supplementary	25.08.2021
VITS	Parallel end-to-end TTS method that generates more natural sounding audio than current two-stage models	Jaehyeon Kim Jungil Kong Juhee Son	demo	23.08.2021
Bringing Old Photo Back to Life	Restoring old photos that suffer from severe degradation through a deep learning approach	Ziyu Wan Bo Zhang Dongdong Chen Pan Zhang others Dong Chen Jing Liao Fang Wen	demo project	13.07.2021
PTI	Pivotal Tuning Inversion enables employing off-the-shelf latent based semantic editing techniques on real images using StyleGAN	Daniel Roich Ron Mokady Amit Bermano Daniel Cohen-Or	,	01.07.2021
TediGAN	Framework for multi-modal image generation and manipulation with textual descriptions	Weihao Xia Yujiu Yang Jing-Hao Xue Baoyuan Wu	, , , ,	30.06.2021
SCALE	Modeling Clothed Humans with a Surface Codec of Articulated Local Elements	Qianli Ma Shunsuke Saito Jinlong Yang Siyu Tang Michael Black	data , poster project ,	26.06.2021
CogView	Mastering Text-to-Image Generation via Transformers	Ming Ding Zhuoyi Yang Wenyi Hong Wendi Zheng others Chang Zhou Junyang Lin Xu Zou Zhou Shao Hongxia Yang Jie Tang	demo ,	21.06.2021
GANs N' Roses	Stable, Controllable, Diverse Image to Image Translation	Min Jin Chong David Forsyth	, ,	19.06.2021
Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes	A method to stylize images by optimizing parameterized brushstrokes instead of pixels	Dmytro Kotovenko Matthias Wright Arthur Heimbrecht Björn Ommer	project	02.06.2021
Pixel2Style2Pixel	Encoding in Style: A StyleGAN Encoder for Image-to-Image Translation	Elad Richardson Yuval Alaluf Yotam Nitzan Daniel Cohen-Or	, project	01.06.2021
Fine-tuning a BERT	We will work through fine-tuning a BERT model using the tensorflow-models PIP package	Chen Chen Claire Yao		25.05.2021
ReStyle	A Residual-Based StyleGAN Encoder via Iterative Refinement	Yuval Alaluf Or Patashnik Daniel Cohen-Or	, , , project	21.05.2021
Motion Representations for Articulated Animation	Novel motion representations for animating articulated objects consisting of distinct parts	Aliaksandr Siarohin Oliver Woodford Jian Ren Menglei Chai Sergey Tulyakov	project	29.04.2021
SAM	Age Transformation Using a Style-Based Regression Model	Yuval Alaluf Or Patashnik Daniel Cohen-Or	, project	26.04.2021
Geometry-Free View Synthesis	Is a geometric model required to synthesize novel views from a single image?	Robin Rombach Patrick Esser Björn Ommer	data	22.04.2021
NeRViS	An algorithm for full-frame video stabilization by first estimating dense warp fields	Yu-Lun Liu Wei-Sheng Lai Ming-Hsuan Yang Yung-Yu Chuang Jia-Bin Huang	data , project	11.04.2021
NeX	View synthesis based on enhancements of multiplane image that can reproduce NeXt-level view-dependent effects in real time	Suttisak Wizadwongsa Pakkapon Phongthawee Jiraphon Yenphraphai Supasorn Suwajanakorn	data, data project vistec	25.03.2021
Score SDE	Score-Based Generative Modeling through Stochastic Differential Equations	Yang Song Jascha Sohl-Dickstein Diederik Kingma Abhishek Kumar others Stefano Ermon Ben Poole	, , , ,	18.03.2021
Talking Head Anime from a Single Image	The network takes as input an image of an anime character's face and a desired pose, and it outputs another image of the same character in the given pose	Pramook Khungurn	project , , ,	23.02.2021
NFNet	An adaptive gradient clipping technique, a significantly improved class of Normalizer-Free ResNets	Andrew Brock Soham De Samuel L. Smith Karen Simonyan	, ,	17.02.2021
RITM	Simple feedforward model for click-based interactive segmentation that employs the segmentation masks from previous steps	Konstantin Sofiiuk Ilia Petrov Anton Konushin	,	13.02.2021
CLIP	A neural network which efficiently learns visual concepts from natural language supervision	Jong Wook Kim Alec Radford Ilya Sutskever	data paper project slides	29.01.2021
Adversarial Patch	A method to create universal, robust, targeted adversarial image patches in the real world	Tom Brown		27.01.2021
MSG-Net	Multi-style Generative Network with a novel Inspiration Layer, which retains the functionality of optimization-based approaches and has the fast speed of feed-forward networks	Hang Zhang Kristin Dana	project	25.01.2021
f-BRS	Feature backpropagating refinement scheme that solves an optimization problem with respect to auxiliary variables instead of the network inputs, and requires running forward and backward pass just for a small part of a network	Konstantin Sofiiuk Ilia Petrov Olga Barinova Anton Konushin	,	25.01.2021
Neural Style Transfer	Implementation of Neural Style Transfer in Keras 2.0+	Somshubra Majumdar	, ,	22.01.2021
SkyAR	A vision-based method for video sky replacement and harmonization, which can automatically generate realistic and dramatic sky backgrounds in videos with controllable styles	Zhengxia Zou	project	18.01.2021
MusicXML Documentation	The goal of this notebook is to explore one of the magenta libraries for music	Prakruti Joshi Falak Shah Twisha Naik	magenta music theory musicXML	08.01.2021
SVG VAE	A colab demo for the SVG VAE model	Raphael Gontijo Lopes	blog post	08.01.2021
Neural Magic Eye	Learning to See and Understand the Scene Behind an Autostereogram	Zhengxia Zou Tianyang Shi Yi Yuan Zhenwei Shi	project	01.01.2021
FGVC	Method first extracts and completes motion edges, and then uses them to guide piecewise-smooth flow completion with sharp edges	Chen Gao Ayush Saraf Johannes Kopf Jia-Bin Huang	project	30.12.2020
VIBE	Video Inference for Body Pose and Shape Estimation, which makes use of an existing large-scale motion capture dataset together with unpaired, in-the-wild, 2D keypoint annotations	Muhammed Kocabas Nikos Athanasiou Michael Black	, , , , , , , , , , , ,	23.12.2020
SeFa	A closed-form approach for unsupervised latent semantic factorization in GANs	Yujun Shen Bolei Zhou	project	06.12.2020
Stylized Neural Painting	An image-to-painting translation method that generates vivid and realistic painting artworks with controllable styles	Zhengxia Zou Tianyang Shi Yi Yuan Zhenwei Shi	project	01.12.2020
BiT	Big Transfer: General Visual Representation Learning	Alexander Kolesnikov Lucas Beyer Xiaohua Zhai Joan Puigcerver others Jessica Yung Sylvain Gelly Neil Houlsby	, , ,	12.11.2020
LaSAFT	Latent Source Attentive Frequency Transformation for Conditioned Source Separation	Woosung Choi	data project	01.11.2020
Lifespan Age Transformation Synthesis	Multi-domain image-to-image generative adversarial network architecture, whose learned latent space models a continuous bi-directional aging process	Roy Or-El Soumyadip Sengupta Ohad Fried Eli Shechtman Ira Kemelmacher-Shlizerman	, , project ,	31.10.2020
HiGAN	Semantic Hierarchy Emerges in Deep Generative Representations for Scene Synthesis	Ceyuan Yang Yujun Shen Bolei Zhou	, , project	14.10.2020
InterFaceGAN	Interpreting the Latent Space of GANs for Semantic Face Editing	Yujun Shen Jinjin Gu Xiaoou Tang Bolei Zhou	, , , project	13.10.2020
Instance-aware Image Colorization	Novel deep learning framework to achieve instance-aware colorization	Jheng-Wei Su	project	30.08.2020
MoCo	Momentum Contrast for unsupervised visual representation learning	Kaiming He Haoqi Fan Yuxin Wu Saining Xie Ross Girshick	, , , ,	20.08.2020
CAPE	Learning to Dress 3D People in Generative Clothing	Qianli Ma Jinlong Yang Anurag Ranjan Sergi Pujades others Gerard Pons-Moll Siyu Tang Michael Black	, , data , , project ,	05.08.2020
Rewriting a Deep Generative Model	We ask if a deep network can be reprogrammed to follow different rules, by enabling a user to directly change the weights, instead of training with a data set	David Bau Steven Liu Tongzhou Wang Jun-Yan Zhu Antonio Torralba	, , project ,	01.08.2020
SIREN	Implicit Neural Representations with Periodic Activation Functions	Vincent Sitzmann Julien Martel	data project	25.06.2020
3D Photo Inpainting	Method for converting a single RGB-D input image into a 3D photo, i.e., a multi-layer representation for novel view synthesis that contains hallucinated color and depth structures in regions occluded in the original view	Meng-Li Shih Shih-Yang Su Johannes Kopf Jia-Bin Huang	project	04.05.2020
Motion Supervised co-part Segmentation	A self-supervised deep learning method for co-part segmentation	Aliaksandr Siarohin Subhankar Roy		07.04.2020
Onsets and Frames	Onsets and Frames is an automatic music transcription framework with piano and drums models	Curtis Hawthorne Erich Elsen	, , blog post data, data	02.04.2020
FBA Matting	Low-cost modification to alpha matting networks to also predict the foreground and background colours	Marco Forte François Pitié		19.03.2020
BERT score	An automatic evaluation metric for text generation	Tianyi Zhang		05.03.2020
Generating Piano Music with Transformer	This Colab notebook lets you play with pretrained Transformer models for piano music generation, based on the Music Transformer	Ian Simon Anna Huang Jesse Engel Curtis Hawthorne	, blog post	16.09.2019
HMR	End-to-end framework for reconstructing a full 3D mesh of a human body from a single RGB image	Angjoo Kanazawa Michael Black David Jacobs Jitendra Malik	, , , , project	15.03.2019
GANSynth	This notebook is a demo GANSynth, which generates audio with Generative Adversarial Networks	Jesse Engel	, project	25.02.2019
Latent Constraints	Conditional Generation from Unconditional Generative Models	Jesse Engel Matthew Hoffman Adam Roberts	data	27.11.2017
Performance RNN	This notebook shows you how to generate new performed compositions from a trained model	Ian Simon Sageev Oore Curtis Hawthorne	blog post data	11.07.2017
NSynth	This colab notebook has everything you need to upload your own sounds and use NSynth models to reconstruct and interpolate between them	Jesse Engel Cinjon Resnick Adam Roberts Sander Dieleman others Karen Simonyan Mohammad Norouzi Douglas Eck	blog post data tutorial ,	06.04.2017

Tutorials

name	description	authors	links	update
Building Your Own Federated Learning Algorithm	We discuss how to implement federated learning algorithms without deferring to the tff.learning API	Zachary Charles	blog post	01.11.2024
Federated Learning for Image Classification	We use the classic MNIST training example to introduce the Federated Learning API layer of TFF, tff.learning - a set of higher-level interfaces that can be used to perform common types of federated learning tasks, such as federated training, against user-supplied models implemented in TensorFlow	Krzysztof Ostrowski	data ,	01.11.2024
Federated Learning for Text Generation	We start with a RNN that generates ASCII characters, and refine it via federated learning	Krzysztof Ostrowski	, data, data	01.11.2024
Custom Federated Algorithms, Part 1: Introduction to the Federated Core	This tutorial is the first part of a two-part series that demonstrates how to implement custom types of federated algorithms in TensorFlow Federated using the Federated Core - a set of lower-level interfaces that serve as a foundation upon which we have implemented the Federated Learning layer	Krzysztof Ostrowski	,	01.11.2024
Custom Federated Algorithms, Part 2: Implementing Federated Averaging	This tutorial is the second part of a two-part series that demonstrates how to implement custom types of federated algorithms in TFF using the Federated Core, which serves as a foundation for the Federated Learning layer	Krzysztof Ostrowski	,	01.11.2024
High-performance simulations with TFF	This tutorial will describe how to setup high-performance simulations with TFF in a variety of common scenarios	Krzysztof Ostrowski		01.11.2024
Autodistill	Uses big, slower foundation models to train small, faster supervised models	autodistill	blog post , , , , , , , , , , , , , , , , , ,	01.11.2024
Kornia	Library is composed by a subset of packages containing operators that can be inserted within neural networks to train models to perform image transformations, epipolar geometry, depth estimation, and low-level image processing such as filtering and edge detection that operate directly on tensors	Edgar Riba Dmytro Mishkin Daniel Ponsa Ethan Rublee Gary Bradski	blog post website , ,	31.10.2024
LightAutoML	Allows you create machine learning models using just a few lines of code, or build your own custom pipeline using ready blocks	Alexander Ryzhkov Anton Vakhrushev Dmitry Simakov	, , , , , , , website , , , ,	31.10.2024
Llama 3.1	First openly available model that rivals the top AI models when it comes to state-of-the-art capabilities in general knowledge, steerability, math, tool use, and multilingual translation	unsloth	blog post , , , , ,	31.10.2024
Phi-3.5	3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5, despite being small enough to be deployed on a phone	unsloth	blog post , website	31.10.2024
Mistral Small	Enterprise-grade small model	unsloth	website	31.10.2024
Gemma 2	New addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters	unsloth	blog post , , , ,	31.10.2024
NotebookLlama	Open Source version of NotebookLM	Meta	meidum	29.10.2024
MuJoCo	A general purpose physics engine that aims to facilitate research and development in robotics, biomechanics, graphics and animation, machine learning, and other areas which demand fast and accurate simulation of articulated structures interacting with their environment	Emo Todorov Tom Erez Yuval Tassa	, website , , , ,	28.10.2024
YOLOv8	State-of-the-art model that builds upon the success of previous YOLO versions and introduces new features and improvements to further boost performance and flexibility	Glenn Jocher	COCO ImageNet blog post , , , , ,	25.10.2024
AutoGen	Framework that enables development of LLM applications using multiple agents that can converse with each other to solve tasks	microsoft	blog post project , ,	22.10.2024
XGBoost	Optimized distributed gradient boosting library designed to be highly efficient, flexible and portable	Tianqi Chen Carlos Guestrin	, , , , , , ,	22.10.2024
ARENA	Provide talented individuals with the skills, tools, and environment necessary for upskilling in ML engineering, for the purpose of contributing directly to AI alignment in technical roles	Callum McDougall	website	21.10.2024
YOLOv5	You Only Look Once	Glenn Jocher	data ,	19.10.2024
YOLOv3	You Only Look Once	Glenn Jocher	data ,	19.10.2024
dm_control	DeepMind Infrastructure for Physics-Based Simulation	Saran Tunyasuvunakool Alistair Muldal Yotam Doron Siqi Liu others Steven Bohez Josh Merel Tom Erez Timothy Lillicrap Nicolas Heess Yuval Tassa	, , , , , blog post , ,	17.10.2024
LangGraph	Library for building stateful, multi-actor applications with LLMs, used to create agent and multi-agent workflows	LangChain	blog post , website , , , ,	10.10.2024
SAE Lens	Training Sparse Autoencoders on Language Models	Joseph Bloom Curt Tigges David Chanin		07.10.2024
LM Evaluation Harness	Framework for few-shot evaluation of language models.	EleutherAI	, , , project	04.10.2024
Multimodal Maestro	Gives you more control over large multimodal models to get the outputs you want	Roboflow	, blog post website	26.09.2024
TRL	Set of tools to train transformer language models with Reinforcement Learning, from the Supervised Fine-tuning step, Reward Modeling step to the Proximal Policy Optimization step	Leandro von Werra Younes Belkada Lewis Tunstall Edward Beeching others Tristan Thrush Nathan Lambert	,	24.09.2024
The Autodiff Cookbook	You'll go through a whole bunch of neat autodiff ideas that you can cherry pick for your own work, starting with the basics	Alex Wiltschko Matthew Johnson	, , , book, book , tutorial , , ,	20.09.2024
Supervision	Reusable computer vision tools	Roboflow	, website , ,	19.09.2024
PEFT	Parameter-Efficient Fine-Tuning methods enable efficient adaptation of pre-trained language models to various downstream applications without fine-tuning all the model's parameters	Sourab Mangrulkar Sylvain Gugger Lysandre Debut Younes Belkada Sayak Paul	blog post , , , , , , ,	13.09.2024
SAA+	Framework, Segment Any Anomaly +, for zero-shot anomaly segmentation with hybrid prompt regularization to improve the adaptability of modern foundation models	Yunkang Cao Xiaohao Xu Chen Sun Yuqi Cheng others Zongwei Du Liang Gao Weiming Shen	,	13.09.2024
TensorRT	SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications	nvidia	blog post forum website , , ,	12.09.2024
DataChain	AI-dataframe to enrich, transform and analyze data from cloud storages for ML training and LLM apps	Iterative	,	09.09.2024
TFF for Federated Learning Research: Model and Update Compression	We use the EMNIST dataset to demonstrate how to enable lossy compression algorithms to reduce communication cost in the Federated Averaging algorithm	Weikang Song	tensor encoding ,	05.09.2024
LlamaIndex	Data framework for your LLM application	Jerry Liu	, website , , , , , , , ,	05.09.2024
VC	Client software for performing real-time voice conversion using various Voice Conversion AI	w-okada	, , , , , , , ,	02.09.2024
Deforum Stable Diffusion	Open source project is designed to be free to use and easy to modify for custom needs and pipelines	EnzymeZoo Артем Храпов Forest Star Walz pharmapsychotic	project , ,	30.08.2024
ComfyUI	Powerful and modular stable diffusion GUI and backend	comfyanonymous	examples pytorch , ,	30.08.2024
Machine Learning Simplified	A Gentle Introduction to Supervised Learning	Andrew Wolf	, website	29.08.2024
Anomalib	Deep learning library that aims to collect state-of-the-art anomaly detection algorithms for benchmarking on both public and private datasets	Samet Akcay Dick Ameln Ashwin Vaidya Barath Lakshmanan others Nilesh Ahuja Utku Genc	data ,	29.08.2024
Nerfstudio	API that allows for a simplified end-to-end process of creating, training, and testing NeRFs	Matthew Tancik Ethan Weber Evonne Ng Ruilong Li others Brent Yi Justin Kerr Terrance Wang Alexander Kristoffersen Jake Austin Kamyar Salahi Abhik Ahuja David McAllister Angjoo Kanazawa	Viewer , , ,	19.08.2024
mlcourse.ai	Open Machine Learning Course	Yury Kashnitsky	blog post project	19.08.2024
PyTerrier	A Python framework for performing information retrieval experiments	Craig Macdonald Nicola Tonellotto	, , , , , ,	16.08.2024
highway-env	A collection of environments for autonomous driving and tactical decision-making tasks	Edouard Leurent	, , , ,	09.08.2024
GNN	Production-tested library for building GNNs at large scale	Oleksandr Ferludin Arno Eigenwillig Martin Blais Dustin Zelle others Jan Pfeifer Alvaro Sanchez-Gonzalez Wai Lok Sibon Li Sami Abu-El-Haija Peter Battaglia Neslihan Bulut Jonathan Halcrow Filipe Miguel Gonçalves de Almeida Pedro Gonnet Liangze Jiang Parth Kothari Silvio Lattanzi André Linhares Brandon Mayer Vahab Mirrokni John Palowitch Mihir Paradkar Jennifer She Anton Tsitsulin Kevin Villela Lisa Wang Bryan Perozzi	, , , , , ,	09.08.2024
Pix2Pix	This notebook demonstrates image to image translation using conditional GAN's	Billy Lamberta	data	24.07.2024
Image classification	This tutorial shows how to classify images of flowers	Billy Lamberta		24.07.2024
TransformerLens	Library for doing mechanistic interpretability of GPT-2 Style language models	Neel Nanda Joseph Bloom	, ,	23.07.2024
Kor	Half-baked prototype that "helps" you extract structured data from text using LLMs	Eugene Yurtsev		20.07.2024
PyTorch3D	Library for deep learning with 3D data	Nikhila Ravi Jeremy Reizenstein David Novotny Taylor Gordon others Wan-Yen Lo Justin Johnson Georgia Gkioxari	, blog post, blog post , , website , , , , , , ,	11.07.2024
Stable Diffusion Videos	Create videos with Stable Diffusion by exploring the latent space and morphing between text prompts	Nathan Raw	,	11.07.2024
Transfer learning and fine-tuning	You will learn how to classify images of cats and dogs by using transfer learning from a pre-trained network	François Chollet		26.06.2024
MARS5	Speech model for insane prosody	CAMB.AI	demo , ,	25.06.2024
Deep RL Course	The Hugging Face Deep Reinforcement Learning Course	Thomas Simonini Omar Sanseviero Sayak Paul	, syllabus , ,	24.06.2024
ToonCrafter	Can interpolate two cartoon images by leveraging the pre-trained image-to-video diffusion priors	Jinbo Xing Hanyuan Liu Menghan Xia Yong Zhang others Xintao Wang Ying Shan Tien-Tsin Wong	project , , , , ,	20.06.2024
Brax	A differentiable physics engine that simulates environments made up of rigid bodies, joints, and actuators	Daniel Freeman Erik Frey Anton Raichuk Sertan Girgin others Igor Mordatch Olivier Bachem		07.06.2024
DiffSynth	Restructured architectures including Text Encoder, UNet, VAE, among others, maintaining compatibility with models from the open-source community while enhancing computational performance	Artiprocher	,	06.06.2024
Transformer	This tutorial trains a Transformer model to translate Portuguese to English	Billy Lamberta	, link	31.05.2024
NeMo	A conversational AI toolkit built for researchers working on automatic speech recognition, natural language processing, and text-to-speech synthesis	Oleksii Kuchaiev Jason Li Chip Huyen Oleksii Hrinchuk others Ryan Leary Boris Ginsburg Samuel Kriman Stanislav Beliaev Vitaly Lavrukhin Jack Cook	project	25.05.2024
SentencePiece	An unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabulary size is predetermined prior to the neural model training	Taku Kudo John Richardson	, , , , , , ,	21.05.2024
Llama3 from scratch	Llama3 from scratch, one tensor and matrix multiplication at a time	Nishant Aklecha	,	19.05.2024
Hello, many worlds	This tutorial shows how a classical neural network can learn to correct qubit calibration errors	Michael Broughton	, ,	17.05.2024
IC-Light	Manipulate the illumination of images	Lvmin Zhang Anyi Rao Maneesh Agrawala	, , ,	09.05.2024
Neural style transfer	This tutorial uses deep learning to compose one image in the style of another image	Billy Lamberta		06.05.2024
TorchGeo	PyTorch domain library that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data	Adam Stewart Caleb Robinson Isaac Corley Anthony Ortiz others Juan Lavista Ferres Arindam Banerjee	NDBI NDVI NDWI data, data	03.05.2024
Autoencoders	This tutorial introduces autoencoders with three examples: the basics, image denoising, and anomaly detection	Google	blog post book data examples	15.04.2024
MagicTime	Metamorphic time-lapse video generation model, which learns real-world physics knowledge from time-lapse videos and implements metamorphic generation	Shenghai Yuan Jinfa Huang Yujun Shi Yongqi Xu others Ruijie Zhu Bin Lin Xinhua Cheng Li Yuan Jiebo Luo	, , , , , , , project ,	14.04.2024
SAGE	Methodology for generative spelling correction, which was tested on English and Russian languages and potentially can be extended to any language with minor changes	Nikita Martynov Mark Baushenko Anastasia Kozlova Katerina Kolomeytseva others Aleksandr Abramov Alena Fenogenova	, , , ,	11.04.2024
Image segmentation	This tutorial focuses on the task of image segmentation, using a modified U-Net	Billy Lamberta	data u-net	09.04.2024
Open-Sora Plan	Simple and efficient design along with remarkable performance in text-to-video generation	YUAN Lab at PKU	, , , ,	07.04.2024
Gorilla	Finetuned LLaMA-based model that surpasses the performance of GPT-4 on writing API calls	Shishir Patil Tianjun Zhang Xin Wang Joseph Gonzalez	project , , , , , , ,	06.04.2024
Cleanlab	Helps you clean data and labels by automatically detecting issues in a ML dataset	Curtis Northcutt Lu Jiang Isaac Chuang	blog post , ,	30.03.2024
AniPortrait	Framework for generating high-quality animation driven by audio and a reference portrait image	[Huawei Wei]() Zejun Yang Zhisheng Wang	, , , , , , , , ,	27.03.2024
OpenVINO	Open-source toolkit for optimizing and deploying AI inference	intel	blog post forum , , , , , , , , , , , , ,	25.03.2024
Gazelle	Joint Speech Language Model	Tincans	blog post demo , , , ,	20.03.2024
Intel® Extension for Transformers	Transformer-based Toolkit to Accelerate GenAI/LLM Everywhere	intel	, , , , , , , , , , , , , , , , , , , ,	19.03.2024
Datasets	A Community Library for Natural Language Processing	Quentin Lhoest Albert Villanova Yacine Jernite Abhishek Thakur others Patrick von Platen Suraj Patil Julien Chaumond Mariama Dramé Julien Plu Lewis Tunstall Joe Davison Mario Šaško Gunjan Chhablani Bhavitvya Malik Simon Brandeis Teven Le Scao Victor Sanh Canwen Xu Nicolas Patry Angelina McMillan-Major Philipp Schmid Sylvain Gugger Clément Delangue Théo Matussière Lysandre Debut Stas Bekman Pierric Cistac Thibault Goehringer Victor Mustar François Lagunas Alexander Rush Thomas Wolf		18.03.2024
Evidently	An open-source framework to evaluate, test and monitor ML models in production	Elena Samuylova Emeli Dral Olga Filippova	website ,	15.03.2024
Instructor	Library that makes it a breeze to work with structured outputs from large language models	Jason Liu	, ,	13.03.2024
Feast	An open source feature store for machine learning	Willem Pienaar Danny Chiao Achal Shah Terence Lim others Ches Martin Judah Rand Matt Delacour Miguel Trejo Marrufo Francisco Javier Arceo	, , , website ,	28.02.2024
FiftyOne	Open-source tool for building high-quality datasets and computer vision models	Brian Moore Jason Corso	blog post , website	27.02.2024
MetaVoice	1.2B parameter base model trained on 100K hours of speech for TTS	MetaVoice	demo ,	26.02.2024
Generative AI for Beginners - A Course	A 12 Lesson course teaching everything you need to know to start building Generative AI applications	microsoft	project	22.02.2024
OmegaConf	Hierarchical configuration system, with support for merging configurations from multiple sources providing a consistent API regardless of how the configuration was created	Omry Yadan	slides	15.02.2024
Optuna	An automatic hyperparameter optimization software framework, particularly designed for machine learning	Takuya Akiba Shotaro Sano Toshihiko Yanase Takeru Ohta Masanori Koyama	website , , ,	15.02.2024
Data augmentation	This tutorial demonstrates data augmentation: a technique to increase the diversity of your training set by applying random transformations such as image rotation	Billy Lamberta		14.02.2024
Stable Cascade	Text to image model introduces an interesting three-stage approach, setting new benchmarks for quality, flexibility, fine-tuning, and efficiency with a focus on further eliminating hardware barriers	Stability AI	blog post , , , , , , ,	14.02.2024
CleanVision	Automatically detects potential issues in image datasets like images that are: blurry, under/over-exposed, (near) duplicates, etc	cleanlab	blog post	13.02.2024
DynamiCrafter	Animating Open-domain Images with Video Diffusion Priors	Jinbo Xing Menghan Xia Yong Zhang Haoxin Chen others Wangbo Yu Hanyuan Liu Xintao Wang Tien-Tsin Wong Ying Shan	, , , , project ,	12.02.2024
XLA	Accelerated Linear Algebra is an open-source machine learning compiler for GPUs, CPUs, and ML accelerators	OpenXLA	, , , ,	02.02.2024
Composer	PyTorch library that enables you to train neural networks faster, at lower cost, and to higher accuracy	The Mosaic ML Team	app , blog post website , ,	01.02.2024
CycleGAN	This notebook demonstrates unpaired image to image translation using conditional GAN's	Billy Lamberta		17.01.2024
Integrated gradients	This tutorial demonstrates how to implement Integrated Gradients, an Explainable AI technique	Google	visualizing , ,	17.01.2024
MAGNeT	Masked generative sequence modeling method that operates directly over several streams of audio tokens	Alon Ziv Itai Gat Gaël Le Lan Tal Remez others Felix Kreuk Alexandre Défossez Jade Copet Gabriel Synnaeve Yossi Adi	, , , , project	16.01.2024
AutoFaiss	Automatically create Faiss knn indices with the most optimal similarity search parameters	Ctiteo		12.01.2024
Retrieval based Voice Conversion WebUI	An easy-to-use Voice Conversion framework based on VITS	RVC-Project	, , , , , , , , ,	11.01.2024
Flax	Neural network library and ecosystem for JAX designed for flexibility	Jonathan Heek Anselm Levskaya Avital Oliver Marvin Ritter others Bertrand Rondepierre Andreas Steiner Marc van Zee	, ,	10.01.2024
Big Vision	This codebase is designed for training large-scale vision models using Cloud TPU VMs or GPU machines	Lucas Beyer Xiaohua Zhai Alexander Kolesnikov	, , , , , , , , , , ,	03.01.2024
Open Interpreter	An open-source, locally running implementation of OpenAI's Code Interpreter	Killian Lucas	website , , , , , , ,	03.01.2024
Seamless Communication	Family of AI models that enable more natural and authentic communication across languages	Loïc Barrault Yu-An Chung Mariano Coria David Dale others Ning Dong Mark Duppenthaler Paul-Ambroise Duquenne Hady Elsahar Min-Jae Hwang Hirofumi Inaguma Ilia Kulikov Pengwei Li Daniel Licht Jean Maillard Ruslan Mavlyutov Kaushik Ram Sadagopan Abinesh Ramakrishnan Tuan Tran Guillaume Wenzek Yilin Yang Ethan Ye Ivan Evtimov Pierre Fernandez Robin San Roman Bokai Yu Pierre Andrews Can Balioglu Peng-Jen Chen Marta Costa-jussà Maha Elbayad Hongyu Gong Francisco Guzmán Kevin Heffernan Somya Jain Justine Kao Ann Lee Xutai Ma Benjamin Peloquin Juan Pino Sravya Popuri Holger Schwenk Anna Sun Paden Tomasello Changhan Wang Skyler Wang Mary Williamson	blog post , , , , , , , ,	14.12.2023
colab2pdf	Convert your Colab notebook to a PDF	Drengskapur		11.12.2023
Sentence Transformers	Multilingual Sentence, Paragraph, and Image Embeddings using BERT & Co	Nils Reimers Iryna Gurevych	, ,	07.12.2023
CleanRL	Deep Reinforcement Learning library that provides high-quality single-file implementation with research-friendly features	Shengyi Huang Rousslan Dossa Chang Ye Jeff Braga others Dipam Chakraborty Kinal Mehta João Araújo	, , , , , , , , , paper ,	28.11.2023
Vocos	Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis	Hubert Siuzdak	project	21.11.2023
X—LLM	Easy LLM Finetuning using the most advanced methods	Boris Zubarev	, , ,	15.11.2023
Distil-Whisper	Maintains the robustness of the Whisper model to difficult acoustic conditions, while being less prone to hallucination errors on long-form audio	Sanchit Gandhi Patrick von Platen Alexander Rush	, , , , , , , , , , ,	08.11.2023
AnimateDiff	Practical framework to animate most of the existing personalized text-to-image models once and for all, saving efforts in model-specific tuning	Yuwei Guo Ceyuan Yang Anyi Rao Yaohui Wang others Yu Qiao Dahua Lin Bo Dai	, , project , ,	30.10.2023
Intel® Neural Compressor	Aims to provide popular model compression techniques such as quantization, pruning (sparsity), distillation, and neural architecture search on mainstream frameworks such as TensorFlow, PyTorch, ONNX Runtime, and MXNet, as well as Intel extensions such as Intel Extension for TensorFlow and Intel Extension for PyTorch	intel	, , [<img src="images/git.svg" alt="git" hei

amrzv / awesome-colab-notebooks

readme

Awesome colab notebooks collection for ML experiments

Trending

Research

Tutorials