Robô Educa is an innovative platform that teaches programming to children aged 6 to 14, promoting inclusion and sustainability 🤝♻️. The journey begins with an inspiring story of two siblings, Suzy and Otávio, who want to build a robot 🤖 and count on the help of their teacher Carlos Sales 👨🏫 who encourages them to build their own humanoid robot using recycled materials, programming, and the cloud ♻️💻☁️.
It is in this context that the Robô Educa Web Application emerges, which is accessible from any smartphone 📱 and becomes the "brain" of the robot 🧠 interacting with the child through audio messages 🗣️ making it accessible even for visually impaired people. The assembly of the robot and interaction with the application stimulates motor coordination 🖐️ and creativity ✨ teaching the child technological concepts in a playful and inclusive way.
And all the magic done by the application is only possible due to the use of the Google GEMINI API 🤖 which allows Robô Educa to understand and answer the child's questions, explain complex concepts 🤯 and perform gamified quizzes 🎉. This technology available in Google's cloud turns learning into a natural and fun conversation 😄 opening new perspectives for the future of these children 🚀.
Since 2018, this work, which is carried out voluntarily, has already impacted hundreds of children in several needy communities in the city of Salvador, Bahia - Brazil.
The mastermind behind this project, Carlos Sales, is a black man from a peripheral area who graduated in Data Science and Systems Developer. He tells a little about his story in the documentary C0d3rs Championship available on Amazon Prime Video.
But it was only in the year 2024 with the advent of Generative AI and the Google GEMINI API, that the robot began to have a brain capable of responding intelligently and quickly, making the interaction much more fluid and charming 😄!
The Robô Educa platform offers a practical and creative experience for students, guiding them in the physical assembly of a humanoid robot. This robot can be made with recyclable materials like PET bottles ♻️ or MDF wood kits. After the physical assembly, students bring the robot to life using its "brain" 🧠 which is the application contained in this repository.
The application, the robot's brain, allows it to perform cognitive functions such as listening, thinking, and speaking.
The application is developed using open-source tools and hosted on Google Cloud, leveraging its robust infrastructure. The backend is developed in Python using the Flask framework, following the Service/Repository design pattern:
For data storage, the platform uses a NoSQL database, Firebase/Firestore, which offers scalability and flexibility to store conversations and user data.
routes.py
The routes.py
file manages all available routes in the application. This is where different endpoints are defined to handle user interactions and data processing.
# Dependências
from main import app
from flask import render_template, request, session, redirect, url_for, make_response, jsonify
# Serviços
import service.loginService as loginService
import service.talkService as talkService
# Página inicial/Index
@app.route('/')
def home():
return render_template('index.html')
# Troca de mensagens entre usuário e bot
@app.route('/talk', methods=['POST'])
def talk():
# Verifica se usuário está logado
if not session.get('userId'): return make_response(jsonify({"error": "Não autorizado"}), 401)
# obtem dados da requisição - mensagem do usuário
data = request.get_json()
userMessage = data.get('message')
# Envia mensagem para Bot e aguarda respectiva resposta
botResponse = talkService.talk(userMessage)
# retorna ao Front com resposta do Bot
return botResponse
The frontend is implemented using HTML, CSS, and JavaScript, focusing on simplicity and ease of use. It starts by requesting access to the microphone, which is managed by static/js/mediadevices.js
.
When the application is launched, it checks the permissions for microphone usage. If it is the first time the user accesses the app, they will be prompted to grant permission. This process is managed by the static/js/mediadevices.js
file.
async function devices_micPrompt() {
let permission;
await navigator.mediaDevices
.getUserMedia({
audio: true
})
.then(function (stream) {
permission = "granted"
})
.catch(function (error) {
if (error.message == "Requested device not found") {
permission = "notFound";
} else if (error.message == "Permission denied") {
permission = "denied";
} else {
console.log(error.message)
permission = 'error';
}
});
return permission;
}
The login process is managed by static/js/login.js
, which sends a POST request to the backend to validate the user. If the user does not have valid credentials, they can log in as a guest.
async function login(usertype) {
let username = document.getElementById('username').value;
let password = document.getElementById('password').value;
displayStartLogin();
await fetch('/login', {
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify({ username: username, password: password, usertype: usertype })
})
.then(response => response.json())
.then(data => {
displayStopLogin();
switch (data.status) {
case 'success':
goToPage("interaction");
break;
case 'errorUser':
alert("Usuário inexistente");
document.getElementById("username").focus();
break;
case 'errorPwd':
alert("Senha incorreta");
document.getElementById("password").focus();
break;
case 'errorGuest':
alert("Não foi possível criar usuário temporário. Tente novamente!");
break;
default:
break;
}
});
}
After successful login, interaction begins on the frontend with the templates/interaction.html
file. The visual interface, managed by static/js/display.js
, is simple, with elements that symbolize listening, thinking, and speaking.
Continuous Listening and Speech Processing:
The robot starts with a greeting and invites the user to participate in a programming quiz. After speaking, the app activates the microphone in continuous mode, listening to what the user says. These tasks are performed by static/js/talk.js
, which uses the Media Devices
, SpeechRecognition()
, and SpeechSynthesisUtterance()
APIs.
recognition = new SpeechRecognition();
recognition.lang = "pt-BR";
recognition.continuous = true; // Reconhecimento contínuo em loop
recognition.interimResults = false; // resultados parciais
// Este evento é acionado quando o reconhecimento de voz captura um resultado
recognition.onresult = event => {
const transcript = event.results[event.resultIndex][0].transcript;
talk(transcript); // Envia transcrição do audio falado pelo usuário para o backend processar junto à Inteligência Artificial e dar uma respectiva resposta
};
// Verifica se usuário não estiver falando (reproduzindo audio). Após 1 minuto de inatividade, interrrompe reconhecimento e exibe botão de pausa
recognition.onend = () => {
if (speakStatus == false) {
timestampAtual = Date.now();
var diferenca = timestampAtual - timestampParam;
var minutosPassados = diferenca / (1000 * 60);
if (minutosPassados < 1) {
recognition.start(); // Inicia o reconhecimento de voz
} else {
hideAllExceptClose();
showElement("divPauseStart");
}
}
};
// Sintese de Fala - faz o dispositivo reproduzir uma mensagem através de seus autofalantes/fones
function speak(message) {
message = removerEmojis(message);
const utterThis = new SpeechSynthesisUtterance(message);
utterThis.pitch = 2;
utterThis.rate = 4;
utterThis.onstart = function () {
hideAllExceptClose(); // Oculta elementos que estiverem visiveis na tela
showElement("divSpinnerWaves");
};
utterThis.onend = function () {
speakStatus=false;
hideAllExceptClose(); // Oculta elementos que estiverem visiveis na tela
showElement("divSpinnerRipple"); // Exibe Spinner simulando ondulaçao de escuta
recognition.start(); // Inicia o reconhecimento de voz
timestampParam = Date.now();
};
recognition.stop(); // Ao iniciar a fala (reprodução do audio) Interrompe o reconhecimento de voz
speakStatus=true; // Speaking on=true off=false
synth.speak(utterThis); // inicia a reprodução da mensagem
}
// Remove emojis da mensagem, para que a mesma possa ser reproduzida via sintese de fala
function removerEmojis(texto) {
return texto
.replace(/\p{Emoji}/gu, '') // Remove emojis
.replace(/\s+/g, ' ') // Remove espaços em branco extras
.trim(); // Remove espaços em branco no início e no fim
}
When a complete sentence is detected, it is sent to the backend for cognitive processing. This task is performed by the GEMINI API, using the gemini-1.5-flash
model that produces fast and accurate responses, ensuring fluid conversations that make the robot more engaging and realistic.
We use the Zero-Shot Prompting technique combined with a GEMINI SDK feature, System instructions, which provide a frame of reference for the model, helping it understand the task and respond appropriately without needing specific examples.
import google.generativeai as genai
genai.configure(api_key=my_api_key)
system_instruction = os.environ.get("SYSTEM_INSTRUCTIONS") # Gemini - Instruções do Sistema / Informa as caracteristicas do Assistente.
model = genai.GenerativeModel(model_name=ai_model,
generation_config=generation_config,
system_instruction=system_instruction,
safety_settings=safety_settings)
# Interação com a Google Gemini API
def talk(userMessage):
# Obtem ID do usuário logado
user_id = session["userId"]
# Obtem histórico de mensagens do usuário
message_history = messageHistory.getById(user_id)
message_history_gemini_format = format_messages_for_gemini(message_history)
# Salva mensagem do usuário em banco de dados
role = "user" # role=user => mensagem enviada pelo usuário
messageHistory.store(user_id, role, userMessage)
# Inicia interação com Gemini AI
try:
convo = model.start_chat(history = message_history_gemini_format) # Inicia chat, contextualizando a IA com o histórico da conversação
convo.send_message(userMessage) # envia nova mensagem para ser processada pela IA
bot_response = convo.last.text # Obtem resposta da IA
except:
bot_response = "error"
# Salva resposta do Bot em banco de dados
if bot_response != "error":
role = "model" # role=model => mensagem enviada pela IA
messageHistory.store(user_id, role, bot_response)
else:
bot_response = "Desculpe, não foi possível obter resposta da Inteligência Artificial."
response = {"status": "success", "message": bot_response}
return response
The generation_config
parameter is used to control the behavior of the language model during text generation. It contains several important settings that influence the creativity, focus and size of the model's responses.
In particular, we draw attention to the TEMPERATURE' setting, which controls the degree of randomness in the responses. Higher values (close to 2) lead to more creative and diverse results, but they may be less predictable and may occasionally contain errors/hallucinations. Lower values (close to 0) generate more focused and conservative responses, but tend to repeat common patterns.
The max_output_tokens
setting defines the maximum number of tokens (words or subwords) that the model can generate in the response. This avoids excessively long responses and helps control processing time.
generation_config = {
"temperature": 1,
"top_p": 0.95,
"top_k": 64,
"max_output_tokens": 8192
}
genai.configure(api_key=my_api_key)
model = genai.GenerativeModel(model_name=ai_model,
generation_config=generation_config,
system_instruction=system_instruction,
safety_settings=safety_settings)
The Google Gemini API offers a feature called safety_settings
that allows you to control the language model's behavior in terms of safety, especially in conversations with children. When instantiating the model, it is possible to define the desired levels of protection against inappropriate or dangerous content.
safety_settings = [
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_LOW_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_LOW_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_LOW_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_LOW_AND_ABOVE"
}
]
genai.configure(api_key=my_api_key)
model = genai.GenerativeModel(model_name=ai_model,
generation_config=generation_config,
system_instruction=system_instruction,
safety_settings=safety_settings)
Where:
category: The specific category of harmful content you want to block. Available categories are:
And:
threshold: The parameter that defines the level of strictness with which the model should block content within a given category. The selected value was:
BLOCK_LOW_AND_ABOVE: Blocks any content within the category that is considered "low", "medium", or "high" in terms of risk. This is the highest security level and is appropriate for environments where child protection is a priority.
The platform stores each user’s conversation in a Firestore database using NoSQL collections. This provides several benefits:
import time
import repository.db_resource as dbr
# Instância de conexão com banco de dados NoSQL Google Firestore
db = dbr.firestore_resource()
# Obtem histórico de mensagens a partir do ID do usuário
def getById(user_id):
collection = f"message_history_{user_id}"
messages_ref = db.collection(collection).order_by("timestamp")
messages = messages_ref.stream()
return messages
# Salva mensagem para recuperação de histórico de conversa na contextualização da resposta
def store(user_id, role, message):
collection = f"message_history_{user_id}"
try:
doc_ref = db.collection(collection).document()
doc_ref.set({
"timestamp": int(time.time()),
"role": role,
"parts": [message]
})
except Exception as e:
print(f"Erro ao salvar mensagem no banco de dados. Detalhes: {e}")
return False
And regarding content personalization, Google GEMINI is capable of handling up to 2 million Tokens. This represents a considerable volume of data, capable of storing a significant amount of information and interactions for educational content personalization.
Some practical applications for using this capacity:
Progress Mapping: Storing a complete history of a student's interactions, such as responses to exercises, tests, debates, feedback, time spent on each subject, etc., allows progress to be mapped in a granular and individualized manner.
Pattern Identification: Analyzing this data allows you to identify behavioral patterns, areas of difficulty, strengths, and learning styles for each student.
Intelligent Recommender: Based on history, the system can recommend specific content, activities, exercises, and resources for each student, adapting the pace and difficulty level.
On-Demand Content: The model can generate supporting materials, additional explanations, summaries, or examples on specific topics where the student demonstrates difficulties.
Response Analysis: The model can analyze responses, identifying errors, knowledge gaps, and areas that need reinforcement.
Adaptive Feedback: Feedback can be personalized with clear explanations, examples, and specific tips for each student, increasing learning and retention.
Robô Educa combines physical creativity with cutting-edge artificial intelligence to create an interactive and educational experience for children. The platform's modular architecture and use of modern web technologies make it scalable, secure, and adaptable to diverse learning environments.
Clone the repository:
$ git clone https://github.com/Robo-Educa/robo-educa-gemini-server.git
Install dependencies:
$ cd robo-educa-gemini-server
$ pip install -r requirements.txt
Create a .ENV file from .ENV.EXAMPLE and fill in the environment variable values according to your project.
Run the project:
$ python main.py
Test in the Browser:
http://localhost:5000
$ cd\[path of project]
$ gcloud init
$ gcloud run deploy --source .
This project is licensed under the Apache 2.0 License. Please also observe the Terms of Service.
Contributions are welcome! Feel free to open a pull request or contribute in any other way.
💪 And you? Did you like it? So do your part and collaborate with this initiative so that we can expand our impact even further.