Open zepor opened 3 months ago
The task involves setting up a centralized data warehouse using Azure SQL Database and integrating it with an existing project. This includes provisioning the database, configuring performance settings, setting up secure access protocols, and storing the connection string securely. Additionally, we need to update various configuration files to ensure the connection details are securely managed and integrated.
Follow Microsoft Azure’s provisioning guide to set up the Azure SQL Database instance. Choose the appropriate service tier and compute size based on your requirements.
Ensure proper scaling and performance configurations. Configure the database for high availability and performance using Azure’s built-in tools. Refer to the performance best practices for guidance.
Apply best practices for securing Azure SQL databases, including setting up firewalls and access control. Detailed instructions can be found here.
Store the connection string in GitHub Secrets as AZURE_SQL_CONNECTION_STRING
. Follow the steps below:
Settings
> Secrets
> New repository secret
.AZURE_SQL_CONNECTION_STRING
and paste the connection string.synapse/linkedService/synapse-managed-rg-WorkspaceDefaultSqlServer.json
Update the connection details for the Azure SQL Database and ensure that the connection string is securely retrieved from environment variables or GitHub Secrets.
{
"name": "synapse-managed-rg-WorkspaceDefaultSqlServer",
"type": "Microsoft.Synapse/workspaces/linkedservices",
"properties": {
"typeProperties": {
"connectionString": {
"type": "SecureString",
"value": "@{linkedService().AzureSqlConnectionString}"
}
},
"parameters": {
"DBName": {
"type": "String"
},
"AzureSqlConnectionString": {
"type": "SecureString"
}
},
"type": "AzureSqlDW",
"connectVia": {
"referenceName": "AutoResolveIntegrationRuntime",
"type": "IntegrationRuntimeReference"
},
"annotations": []
}
}
backend-container/flaskapp/config.py
Update the configuration to include the Azure SQL Database connection details and ensure that the connection string is securely retrieved from environment variables or GitHub Secrets.
import os
import logging
from dotenv import load_dotenv
class ConfigurationError(Exception):
pass
dotenv_path = os.path.join(os.path.dirname(__file__), '../.env')
load_dotenv(dotenv_path)
class Config:
DEBUG = False
TESTING = False
connection_string = os.getenv('AZURE_SQL_CONNECTION_STRING')
if not connection_string:
raise ConfigurationError(
"AZURE_SQL_CONNECTION_STRING must be set in .env or environment variables"
)
SQLALCHEMY_DATABASE_URI = connection_string
SQLALCHEMY_TRACK_MODIFICATIONS = False
LOGGING_LEVEL = logging.ERROR
class DevelopmentConfig(Config):
DEBUG = True
TESTING = True
LOGGING_LEVEL = logging.DEBUG
class ProductionConfig(Config):
DEBUG = False
TESTING = False
LOGGING_LEVEL = logging.WARNING
backend-container/src/database/connections.py
Update the file to include the connection string for the Azure SQL Database.
import os
import redis
import pyodbc
from dotenv import load_dotenv
from src.utils.log import be_logger
load_dotenv()
def connect_to_redis(primary_host, fallback_host, port=6379):
try:
r = redis.StrictRedis(host=primary_host, port=port, decode_responses=False)
r.ping()
return r
except redis.exceptions.ConnectionError:
try:
r = redis.StrictRedis(host=fallback_host, port=port, decode_responses=False)
r.ping()
return r
except redis.exceptions.ConnectionError as ex:
be_logger.info("Could not connect to Redis: %s", str(ex))
raise
def get_azure_sql_connection():
try:
azure_sql_connection_string = os.environ.get('AZURE_SQL_CONNECTION_STRING')
if not azure_sql_connection_string:
raise ValueError("AZURE_SQL_CONNECTION_STRING environment variable not set.")
connection = pyodbc.connect(azure_sql_connection_string)
be_logger.debug("Successfully connected to Azure SQL Database.")
return connection
except Exception as e:
be_logger.error(f"Failed to connect to Azure SQL Database: {e}")
return None
redis_primary_host = 'redis'
redis_fallback_host = 'redis-service'
redis_port = 6379
r = connect_to_redis(redis_primary_host, redis_fallback_host, redis_port)
azure_sql_conn = get_azure_sql_connection()
if azure_sql_conn:
pass
README.md
Add a detailed step-by-step setup guide for setting up the Azure SQL Database.
# NFLSTATS Development Environment
This is the NFLSTATS project, designed to provide statistical insights for the NFL.
## THIS APP NEEDS A TON OF WORK
## 1. PIPELINES FROM SPORTS RADAR APIS FOR LIVE DATA
## 2. OTHER ADVANCED ANALYTICS DATASOURCES FOR ADDTIONAL ANALYSIS FOR AN AI MODEL
## 3. DATA VISUALIZATION/DISPLAY
## 4. ADDING OTHER SPORTS
## 5. MAKING AI MODELS FOR SPECIFIC GAMES AND THEIR CONFIDENCE INTERVALS IN A PARTICULAR BET
This guide will help you set up the environment using GitHub Codespaces or set it up locally using Docker and Docker Compose. Be aware the application is half built right now. It needs quite a bit of TLC. This is my first project, feel free to criticize and make recommendations as you see fit!
## NFLSTATS Development Environment Setup Guide
Welcome to the NFLSTATS project! This guide will walk you through setting up your development environment to run the application locally. Before diving into the setup process, ensure you have the following prerequisites installed on your machine:
### Prerequisites
- Git: For cloning the repository. Download and install from Git's official site.
- Docker: Required for running the application's containers. Download and install Docker from Docker's official site.
- Docker Compose: For managing multi-container Docker applications. Typically included with Docker Desktop installations.
- Node.js: The runtime environment for the application. Download and install from Node.js's official site.
- npm: Comes with Node.js and is used for managing dependencies.
- Visual Studio Code (Optional): Recommended IDE for editing and debugging the application. Download from Visual Studio Code's official site.
### Environment Setup
#### Clone the Repository
First, clone the NFLSTATS repository to your local machine using Git:
```bash
git clone https://github.com/[your_username]/nflstats.git
cd nflstats
Replace [your_username]
with your GitHub username if you have forked the repository. Otherwise, use the project's main repository URL.
Ensure Docker and Docker Compose are installed and running on your machine. The application uses Docker containers for both development (docker-compose.debug.yml) and production (docker-compose.prod.yml) environments.
After cloning the repository, navigate to the project directory and install the required npm packages:
npm install
This command installs all dependencies defined in package.json
.
The application requires several environment variables to be set. These variables can be found and modified in the .env
files located in the project root or specific directories for different environments (e.g., .env.development
, .env.production
). Ensure all necessary variables are set before proceeding.
To set up a centralized data warehouse using Azure SQL Database, follow these steps:
AZURE_SQL_CONNECTION_STRING
. Follow the steps below:
Settings
> Secrets
> New repository secret
.AZURE_SQL_CONNECTION_STRING
and paste the connection string.With all prerequisites installed and environment variables set, you can start the development server using npm scripts defined in package.json
:
npm run start:dev
This command executes the start:dev
script, which builds the application for development and starts the development server. If everything is set up correctly, the application should now be running and accessible.
Once the application is running, you can access it by navigating to http://localhost:3000
in your web browser, assuming the default port configuration.
.dockerignore
, ensure to remove them from the ignore list or manually copy them into your project directory. The Dockerfiles are crucial for building the Docker images for both development and production environments.package.json
. These scripts provide shortcuts for common tasks such as building the application, starting the development server, and cleaning the build directory.CONTRIBUTING.md
file for guidelines on how to contribute.By following this guide, you should now have a fully functional development environment for the NFLSTATS project. Happy coding!
### File: `synapse-managed-rg/TemplateForWorkspace.json`
Update the file to include the Azure SQL Database as a linked service.
```json
{
"$schema": "http://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"workspaceName": {
"type": "string",
"metadata": "Workspace name",
"defaultValue": "synapse-managed-rg"
},
"MongoDBSQLD_connectionString": {
"type": "secureString",
"metadata": "Secure string for 'connectionString' of 'MongoDBSQLD'"
},
"MongoDBSQLD_password": {
"type": "secureString",
"metadata": "Secure string for 'password' of 'MongoDBSQLD'"
},
"synapse-managed-rg-WorkspaceDefaultSqlServer_connectionString": {
"type": "secureString",
"metadata": "Secure string for 'connectionString' of 'synapse-managed-rg-WorkspaceDefaultSqlServer'",
"defaultValue": "Integrated Security=False;Encrypt=True;Connection Timeout=30;Data Source=tcp:synapse-managed-rg.sql.azuresynapse.net,1433;Initial Catalog=@{linkedService().DBName}"
},
"AzureSQLDB_connectionString": {
"type": "secureString",
"metadata": "Secure string for 'connectionString' of 'AzureSQLDB'"
},
"MongoDBSQLD_properties_typeProperties_userName": {
"type": "string",
"defaultValue": "zepor"
},
"synapse-managed-rg-WorkspaceDefaultStorage_properties_typeProperties_url": {
"type": "string",
"defaultValue": "https://loveofdatalakes.dfs.core.windows.net"
}
},
"variables": {
"workspaceId": "[concat('Microsoft.Synapse/workspaces/', parameters('workspaceName'))]"
},
"resources": [
{
"name": "[concat(parameters('workspaceName'), '/MongoDBSQLD')]",
"type": "Microsoft.Synapse/workspaces/linkedServices",
"apiVersion": "2019-06-01-preview",
"properties": {
"annotations": [],
"type": "Odbc",
"typeProperties": {
"connectionString": {
"type": "SecureString",
"value": "[parameters('MongoDBSQLD_connectionString')]"
},
"authenticationType": "Basic",
"userName": "[parameters('MongoDBSQLD_properties_typeProperties_userName')]",
"password": {
"type": "SecureString",
"value": "[parameters('MongoDBSQLD_password')]"
}
},
"connectVia": {
"referenceName": "MongoSQLD",
"type": "IntegrationRuntimeReference"
}
},
"dependsOn": [
"[concat(variables('workspaceId'), '/integrationRuntimes/MongoSQLD')]"
]
},
{
"name": "[concat(parameters('workspaceName'), '/PowerBIWorkspace1')]",
"type": "Microsoft.Synapse/workspaces/linkedServices",
"apiVersion": "2019-06-01-preview",
"properties": {
"annotations": [],
"type": "PowerBIWorkspace",
"typeProperties": {
"workspaceID": "E90D9B82-4C5F-40B1-A6D3-2DD8D0A760C4",
"tenantID": "364661e2-3ab8-4544-a3ae-fc0b5adb4aaf"
}
},
"dependsOn": []
},
{
"name": "[concat(parameters('workspaceName'), '/synapse-managed-rg-WorkspaceDefaultSqlServer')]",
"type": "Microsoft.Synapse/workspaces/linkedServices",
"apiVersion": "2019-06-01-preview",
"properties": {
"parameters": {
"DBName": {
"type": "String"
}
},
"annotations": [],
"type": "AzureSqlDW",
"typeProperties": {
"connectionString": "[parameters('synapse-managed-rg-WorkspaceDefaultSqlServer_connectionString')]"
},
"connectVia": {
"referenceName": "AutoResolveIntegrationRuntime",
"type": "IntegrationRuntimeReference"
}
},
"dependsOn": [
"[concat(variables('workspaceId'), '/integrationRuntimes/AutoResolveIntegrationRuntime')]"
]
},
{
"name": "[concat(parameters('workspaceName'), '/AzureSQLDB')]",
"type": "Microsoft.Synapse/workspaces/linkedServices",
"apiVersion": "2019-06-01-preview",
"properties": {
"annotations": [],
"type": "AzureSqlDatabase",
"typeProperties": {
"connectionString": "[parameters('AzureSQLDB_connectionString')]"
},
"connectVia": {
"referenceName": "AutoResolveIntegrationRuntime",
"type": "IntegrationRuntimeReference"
}
},
"dependsOn": [
"[concat(variables('workspaceId'), '/integrationRuntimes/AutoResolveIntegrationRuntime')]"
]
},
{
"name": "[concat(parameters('workspaceName'), '/synapse-managed-rg-WorkspaceDefaultStorage')]",
"type": "Microsoft.Synapse/workspaces/linkedServices",
"apiVersion": "2019-06-01-preview",
"properties": {
"annotations": [],
"type": "AzureBlobFS",
"typeProperties": {
"url": "[parameters('synapse-managed-rg-WorkspaceDefaultStorage_properties_typeProperties_url')]"
},
"connectVia": {
"referenceName": "AutoResolveIntegrationRuntime",
"type": "IntegrationRuntimeReference"
}
},
"dependsOn": [
"[concat(variables('workspaceId'), '/integrationRuntimes/AutoResolveIntegrationRuntime')]"
]
},
{
"name": "[concat(parameters('workspaceName'), '/AutoResolveIntegrationRuntime')]",
"type": "Microsoft.Synapse/workspaces/integrationRuntimes",
"apiVersion": "2019-06-01-preview",
"properties": {
"type": "Managed",
"typeProperties": {
"computeProperties": {
"location": "AutoResolve",
"dataFlowProperties": {
"computeType": "General",
"coreCount": 8,
"timeToLive": 0
}
}
}
},
"dependsOn": []
},
{
"name": "[concat(parameters('workspaceName'), '/MongoSQLD')]",
"type": "Microsoft.Synapse/workspaces/integrationRuntimes",
"apiVersion": "2019-06-01-preview",
"properties": {
"type": "SelfHosted",
"typeProperties": {}
},
"dependsOn": []
},
{
"name": "[concat(parameters('workspaceName'), '/WorkspaceSystemIdentity')]",
"type": "Microsoft.Synapse/workspaces/credentials",
"apiVersion": "2019-06-01-preview",
"properties": {
"type": "ManagedIdentity",
"typeProperties": {}
},
"dependsOn": []
},
{
"name": "[concat(parameters('workspaceName'), '/OdbcTable1')]",
"type": "Microsoft.Synapse/workspaces/datasets",
"apiVersion": "2019-06-01-preview",
"properties": {
"linkedServiceName": {
"reference
[Click here to create a Pull Request with the proposed solution](https://api.codeautopilot.com/create-pr?task=29593&hash=62d4778ba6ddd82371c1cae5aea998646e9232531cb806214599feec39f95d91)
**Files used for this task:**
<details>
<summary>Changes on synapse/linkedService/synapse-managed-rg-WorkspaceDefaultSqlServer.json</summary>
To update the `synapse-managed-rg-WorkspaceDefaultSqlServer.json` file to include the connection details for the Azure SQL Database and ensure that the connection string is securely retrieved from environment variables or GitHub Secrets, follow these steps:
1. **Modify the Connection String**:
- Update the `connectionString` field to retrieve the connection string from an environment variable or GitHub Secret.
2. **Use Secure Parameters**:
- Ensure that sensitive information is not hardcoded in the JSON file.
Here is the updated version of the file:
```json
{
"name": "synapse-managed-rg-WorkspaceDefaultSqlServer",
"type": "Microsoft.Synapse/workspaces/linkedservices",
"properties": {
"typeProperties": {
"connectionString": {
"type": "SecureString",
"value": "@{linkedService().AzureSqlConnectionString}"
}
},
"parameters": {
"DBName": {
"type": "String"
},
"AzureSqlConnectionString": {
"type": "SecureString"
}
},
"type": "AzureSqlDW",
"connectVia": {
"referenceName": "AutoResolveIntegrationRuntime",
"type": "IntegrationRuntimeReference"
},
"annotations": []
}
}
connectionString
is now of type SecureString
, which ensures that the connection string is handled securely.AzureSqlConnectionString
of type SecureString
. This parameter should be populated with the actual connection string from a secure source like environment variables or GitHub Secrets.Store the Connection String Securely:
AzureSqlConnectionString
is stored securely in your environment variables or GitHub Secrets.Update Deployment Scripts:
AzureSqlConnectionString
parameter securely.Documentation:
AzureSqlConnectionString
.By following these steps, you ensure that the connection details for the Azure SQL Database are securely managed and integrated into your Synapse workspace configuration.
Set Up Azure SQL Data Warehouse
Description: Set up a centralized data warehouse using Azure SQL Database. Tasks:
AZURE_SQL_CONNECTION_STRING
. Milestone: Centralized Data Warehouse setup completed. DISCUSS IF THE NFL LAKEHOUSE CAN BE USED HERE ESPECIALLY WITH THE UPDATED SPORTS RADAR API DOCS