As a senior AI/ML Full stack engineer, I have been working for 10 years to develop AI-driven
applications and system design architecture to ensure streaming and consistency between distributed components within the ML system.
I solved challenges in developing ML systems with advanced tech and implemented MLOps/DevOps best practices.
Especially, implemented advanced chunking strategy and FEA architecture in Graph RAG and Router system that can choose the system between semantic search, Graph RAG, Text-to speech and Elastic search.
Also, I have hands-on experience with developing AI agents with Langchain, Llama-index, downstream tasks such as developing Multimodal LLM, computer vision tasks, and Fine-tuning LLM using Tensorflow, python, and scikit-learn and Python backend framework.
I developed a valuable AI agent and bot for use cases for customized conversational AI solutions, domain-specific chatbots, text classification, language translation, question-answering personalized recommendation systems, and even healthcare and marketing automation applications.
With a deep understanding of Transformer architecture and speech recognition, I developed a fast, and extensible multimodal LLM that can understand text as well as human speech, without the need for a separate Audio Speech Recognition (ASR) stage. In this project, after research like AudioLM, SeamlessM4T, Gazelle, and SpeechGPTs, I have extended Meta's Llama 3 model with a multimodal projector that converts audio directly into the high-dimensional space used by Llama 3. This direct coupling allows this to respond much more quickly than systems that combine separate ASR and LLM components.
Innovatively, this multi-modal LLM can natively understand the paralinguistic cues of timing and emotion that are omnipresent in human speech.
Also, I focused on working with the advanced RAG system to improve the accuracy rate to 97% by integrating with various techniques at each step using Euclidean, and cosine similarity and I used best-fit techniques for developing agent use case., deep lake, Knowledge Graph, langGraph with RAG, and using ReRank, expending query, self query, and multi-vector-retriever...
I thoroughly monitor A/B testing and evaluate the experiments using Cuda-enabled Nvidia A100, Comet ML’s experiment tracker, and save the best model to Comet’s model registry. (deployed on Qwak, AWS)
and deploy it as a REST API on Qwak.
Additionally, have extensive experience in managing and maintaining server infrastructure, cloud architecture, and automation for high-performance AI applications, specializing in building microservices and cloud-based solutions on AWS and Azure.
As a senior AI/ML Full stack engineer, I have been working for 10 years to develop AI-driven applications and system design architecture to ensure streaming and consistency between distributed components within the ML system.
I solved challenges in developing ML systems with advanced tech and implemented MLOps/DevOps best practices.
Especially, implemented advanced chunking strategy and FEA architecture in Graph RAG and Router system that can choose the system between semantic search, Graph RAG, Text-to speech and Elastic search.
Also, I have hands-on experience with developing AI agents with Langchain, Llama-index, downstream tasks such as developing Multimodal LLM, computer vision tasks, and Fine-tuning LLM using Tensorflow, python, and scikit-learn and Python backend framework.
I developed a valuable AI agent and bot for use cases for customized conversational AI solutions, domain-specific chatbots, text classification, language translation, question-answering personalized recommendation systems, and even healthcare and marketing automation applications.
With a deep understanding of Transformer architecture and speech recognition, I developed a fast, and extensible multimodal LLM that can understand text as well as human speech, without the need for a separate Audio Speech Recognition (ASR) stage. In this project, after research like AudioLM, SeamlessM4T, Gazelle, and SpeechGPTs, I have extended Meta's Llama 3 model with a multimodal projector that converts audio directly into the high-dimensional space used by Llama 3. This direct coupling allows this to respond much more quickly than systems that combine separate ASR and LLM components.
Innovatively, this multi-modal LLM can natively understand the paralinguistic cues of timing and emotion that are omnipresent in human speech.
Also, I focused on working with the advanced RAG system to improve the accuracy rate to 97% by integrating with various techniques at each step using Euclidean, and cosine similarity and I used best-fit techniques for developing agent use case., deep lake, Knowledge Graph, langGraph with RAG, and using ReRank, expending query, self query, and multi-vector-retriever...
I thoroughly monitor A/B testing and evaluate the experiments using Cuda-enabled Nvidia A100, Comet ML’s experiment tracker, and save the best model to Comet’s model registry. (deployed on Qwak, AWS) and deploy it as a REST API on Qwak.
Additionally, have extensive experience in managing and maintaining server infrastructure, cloud architecture, and automation for high-performance AI applications, specializing in building microservices and cloud-based solutions on AWS and Azure.
Let's discuss in more detail.