LoveofSportsLLC / NFL

NFL + AI
https://loveoffootball.io/
MIT License
0 stars 0 forks source link

Set up a Centralized Data Warehouse (use Azure Tools): #63

Open zepor opened 3 months ago

zepor commented 3 months ago
    ○ Time: 1 week
    ○ Tools Required: Azure SQL Database
    ○ Steps:
        1. Provision Azure SQL Database.
            □ Follow Microsoft Azure’s provisioning guide to set up the Azure SQL Database instance.
        2. Ensure proper scaling and performance configurations.
            □ Configure the database for high availability and performance using Azure’s built-in tools.
        3. Configure secure access protocols.
            □ Apply best practices for securing Azure SQL databases, including setting up firewalls and access control.
        4. Store Azure SQL connection string in GitHub Secrets.
            □ Secrets Needed: AZURE_SQL_CONNECTION_STRING
    ○ Documentation:
        § Step-by-step setup guide with screenshots.
        § Example configuration files in JSON/YAML.
    ○ Costs: The Azure SQL Database will cost approximately $200 to $1000/month based on specifications.
    ○ Major Milestone: Centralized Data Warehouse setup completed.

Set Up Azure SQL Data Warehouse

Description: Set up a centralized data warehouse using Azure SQL Database. Tasks:

codeautopilot[bot] commented 3 months ago

Potential solution

The task involves setting up a centralized data warehouse using Azure SQL Database and integrating it with an existing project. This includes provisioning the database, configuring performance settings, setting up secure access protocols, and storing the connection string securely. Additionally, we need to update various configuration files to ensure the connection details are securely managed and integrated.

How to implement

Step 1: Provision Azure SQL Database

Follow Microsoft Azure’s provisioning guide to set up the Azure SQL Database instance. Choose the appropriate service tier and compute size based on your requirements.

Step 2: Configure Performance Settings

Ensure proper scaling and performance configurations. Configure the database for high availability and performance using Azure’s built-in tools. Refer to the performance best practices for guidance.

Step 3: Set Up Secure Access Protocols

Apply best practices for securing Azure SQL databases, including setting up firewalls and access control. Detailed instructions can be found here.

Step 4: Store Azure SQL Connection String in GitHub Secrets

Store the connection string in GitHub Secrets as AZURE_SQL_CONNECTION_STRING. Follow the steps below:

  1. Go to your GitHub repository.
  2. Navigate to Settings > Secrets > New repository secret.
  3. Add a new secret with the name AZURE_SQL_CONNECTION_STRING and paste the connection string.

Step 5: Update Configuration Files

File: synapse/linkedService/synapse-managed-rg-WorkspaceDefaultSqlServer.json

Update the connection details for the Azure SQL Database and ensure that the connection string is securely retrieved from environment variables or GitHub Secrets.

{
    "name": "synapse-managed-rg-WorkspaceDefaultSqlServer",
    "type": "Microsoft.Synapse/workspaces/linkedservices",
    "properties": {
        "typeProperties": {
            "connectionString": {
                "type": "SecureString",
                "value": "@{linkedService().AzureSqlConnectionString}"
            }
        },
        "parameters": {
            "DBName": {
                "type": "String"
            },
            "AzureSqlConnectionString": {
                "type": "SecureString"
            }
        },
        "type": "AzureSqlDW",
        "connectVia": {
            "referenceName": "AutoResolveIntegrationRuntime",
            "type": "IntegrationRuntimeReference"
        },
        "annotations": []
    }
}

File: backend-container/flaskapp/config.py

Update the configuration to include the Azure SQL Database connection details and ensure that the connection string is securely retrieved from environment variables or GitHub Secrets.

import os
import logging
from dotenv import load_dotenv

class ConfigurationError(Exception):
    pass

dotenv_path = os.path.join(os.path.dirname(__file__), '../.env')
load_dotenv(dotenv_path)

class Config:
    DEBUG = False
    TESTING = False
    connection_string = os.getenv('AZURE_SQL_CONNECTION_STRING')
    if not connection_string:
        raise ConfigurationError(
            "AZURE_SQL_CONNECTION_STRING must be set in .env or environment variables"
        )
    SQLALCHEMY_DATABASE_URI = connection_string
    SQLALCHEMY_TRACK_MODIFICATIONS = False
    LOGGING_LEVEL = logging.ERROR

class DevelopmentConfig(Config):
    DEBUG = True
    TESTING = True
    LOGGING_LEVEL = logging.DEBUG

class ProductionConfig(Config):
    DEBUG = False
    TESTING = False
    LOGGING_LEVEL = logging.WARNING

File: backend-container/src/database/connections.py

Update the file to include the connection string for the Azure SQL Database.

import os
import redis
import pyodbc
from dotenv import load_dotenv 
from src.utils.log import be_logger

load_dotenv()

def connect_to_redis(primary_host, fallback_host, port=6379):
    try:
        r = redis.StrictRedis(host=primary_host, port=port, decode_responses=False)
        r.ping()
        return r
    except redis.exceptions.ConnectionError:
        try:
            r = redis.StrictRedis(host=fallback_host, port=port, decode_responses=False)
            r.ping()
            return r
        except redis.exceptions.ConnectionError as ex:
            be_logger.info("Could not connect to Redis: %s", str(ex))
            raise

def get_azure_sql_connection():
    try:
        azure_sql_connection_string = os.environ.get('AZURE_SQL_CONNECTION_STRING')
        if not azure_sql_connection_string:
            raise ValueError("AZURE_SQL_CONNECTION_STRING environment variable not set.")

        connection = pyodbc.connect(azure_sql_connection_string)
        be_logger.debug("Successfully connected to Azure SQL Database.")
        return connection
    except Exception as e:
        be_logger.error(f"Failed to connect to Azure SQL Database: {e}")
        return None

redis_primary_host = 'redis'
redis_fallback_host = 'redis-service'
redis_port = 6379
r = connect_to_redis(redis_primary_host, redis_fallback_host, redis_port)

azure_sql_conn = get_azure_sql_connection()
if azure_sql_conn:
    pass

File: README.md

Add a detailed step-by-step setup guide for setting up the Azure SQL Database.

# NFLSTATS Development Environment
This is the NFLSTATS project, designed to provide statistical insights for the NFL.

## THIS APP NEEDS A TON OF WORK 
## 1. PIPELINES FROM SPORTS RADAR APIS FOR LIVE DATA 
## 2. OTHER ADVANCED ANALYTICS DATASOURCES FOR ADDTIONAL ANALYSIS FOR AN AI MODEL
## 3. DATA VISUALIZATION/DISPLAY
## 4. ADDING OTHER SPORTS
## 5. MAKING AI MODELS FOR SPECIFIC GAMES AND THEIR CONFIDENCE INTERVALS IN A PARTICULAR BET

This guide will help you set up the environment using GitHub Codespaces or set it up locally using Docker and Docker Compose. Be aware the application is half built right now. It needs quite a bit of TLC. This is my first project, feel free to criticize and make recommendations as you see fit!

## NFLSTATS Development Environment Setup Guide
Welcome to the NFLSTATS project! This guide will walk you through setting up your development environment to run the application locally. Before diving into the setup process, ensure you have the following prerequisites installed on your machine:

### Prerequisites
- Git: For cloning the repository. Download and install from Git's official site.
- Docker: Required for running the application's containers. Download and install Docker from Docker's official site.
- Docker Compose: For managing multi-container Docker applications. Typically included with Docker Desktop installations.
- Node.js: The runtime environment for the application. Download and install from Node.js's official site.
- npm: Comes with Node.js and is used for managing dependencies.
- Visual Studio Code (Optional): Recommended IDE for editing and debugging the application. Download from Visual Studio Code's official site.

### Environment Setup
#### Clone the Repository
First, clone the NFLSTATS repository to your local machine using Git:

```bash
git clone https://github.com/[your_username]/nflstats.git
cd nflstats

Replace [your_username] with your GitHub username if you have forked the repository. Otherwise, use the project's main repository URL.

Docker Environment

Ensure Docker and Docker Compose are installed and running on your machine. The application uses Docker containers for both development (docker-compose.debug.yml) and production (docker-compose.prod.yml) environments.

Node.js and npm Packages

After cloning the repository, navigate to the project directory and install the required npm packages:

npm install

This command installs all dependencies defined in package.json.

Environment Variables

The application requires several environment variables to be set. These variables can be found and modified in the .env files located in the project root or specific directories for different environments (e.g., .env.development, .env.production). Ensure all necessary variables are set before proceeding.

Setting Up Azure SQL Database

To set up a centralized data warehouse using Azure SQL Database, follow these steps:

1. Provision Azure SQL Database

2. Configure Performance Settings

3. Set Up Secure Access Protocols

4. Store Azure SQL Connection String in GitHub Secrets

Running the Application for Development

With all prerequisites installed and environment variables set, you can start the development server using npm scripts defined in package.json:

npm run start:dev

This command executes the start:dev script, which builds the application for development and starts the development server. If everything is set up correctly, the application should now be running and accessible.

Accessing the Application

Once the application is running, you can access it by navigating to http://localhost:3000 in your web browser, assuming the default port configuration.

Additional Information

By following this guide, you should now have a fully functional development environment for the NFLSTATS project. Happy coding!


### File: `synapse-managed-rg/TemplateForWorkspace.json`
Update the file to include the Azure SQL Database as a linked service.

```json
{
    "$schema": "http://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "workspaceName": {
            "type": "string",
            "metadata": "Workspace name",
            "defaultValue": "synapse-managed-rg"
        },
        "MongoDBSQLD_connectionString": {
            "type": "secureString",
            "metadata": "Secure string for 'connectionString' of 'MongoDBSQLD'"
        },
        "MongoDBSQLD_password": {
            "type": "secureString",
            "metadata": "Secure string for 'password' of 'MongoDBSQLD'"
        },
        "synapse-managed-rg-WorkspaceDefaultSqlServer_connectionString": {
            "type": "secureString",
            "metadata": "Secure string for 'connectionString' of 'synapse-managed-rg-WorkspaceDefaultSqlServer'",
            "defaultValue": "Integrated Security=False;Encrypt=True;Connection Timeout=30;Data Source=tcp:synapse-managed-rg.sql.azuresynapse.net,1433;Initial Catalog=@{linkedService().DBName}"
        },
        "AzureSQLDB_connectionString": {
            "type": "secureString",
            "metadata": "Secure string for 'connectionString' of 'AzureSQLDB'"
        },
        "MongoDBSQLD_properties_typeProperties_userName": {
            "type": "string",
            "defaultValue": "zepor"
        },
        "synapse-managed-rg-WorkspaceDefaultStorage_properties_typeProperties_url": {
            "type": "string",
            "defaultValue": "https://loveofdatalakes.dfs.core.windows.net"
        }
    },
    "variables": {
        "workspaceId": "[concat('Microsoft.Synapse/workspaces/', parameters('workspaceName'))]"
    },
    "resources": [
        {
            "name": "[concat(parameters('workspaceName'), '/MongoDBSQLD')]",
            "type": "Microsoft.Synapse/workspaces/linkedServices",
            "apiVersion": "2019-06-01-preview",
            "properties": {
                "annotations": [],
                "type": "Odbc",
                "typeProperties": {
                    "connectionString": {
                        "type": "SecureString",
                        "value": "[parameters('MongoDBSQLD_connectionString')]"
                    },
                    "authenticationType": "Basic",
                    "userName": "[parameters('MongoDBSQLD_properties_typeProperties_userName')]",
                    "password": {
                        "type": "SecureString",
                        "value": "[parameters('MongoDBSQLD_password')]"
                    }
                },
                "connectVia": {
                    "referenceName": "MongoSQLD",
                    "type": "IntegrationRuntimeReference"
                }
            },
            "dependsOn": [
                "[concat(variables('workspaceId'), '/integrationRuntimes/MongoSQLD')]"
            ]
        },
        {
            "name": "[concat(parameters('workspaceName'), '/PowerBIWorkspace1')]",
            "type": "Microsoft.Synapse/workspaces/linkedServices",
            "apiVersion": "2019-06-01-preview",
            "properties": {
                "annotations": [],
                "type": "PowerBIWorkspace",
                "typeProperties": {
                    "workspaceID": "E90D9B82-4C5F-40B1-A6D3-2DD8D0A760C4",
                    "tenantID": "364661e2-3ab8-4544-a3ae-fc0b5adb4aaf"
                }
            },
            "dependsOn": []
        },
        {
            "name": "[concat(parameters('workspaceName'), '/synapse-managed-rg-WorkspaceDefaultSqlServer')]",
            "type": "Microsoft.Synapse/workspaces/linkedServices",
            "apiVersion": "2019-06-01-preview",
            "properties": {
                "parameters": {
                    "DBName": {
                        "type": "String"
                    }
                },
                "annotations": [],
                "type": "AzureSqlDW",
                "typeProperties": {
                    "connectionString": "[parameters('synapse-managed-rg-WorkspaceDefaultSqlServer_connectionString')]"
                },
                "connectVia": {
                    "referenceName": "AutoResolveIntegrationRuntime",
                    "type": "IntegrationRuntimeReference"
                }
            },
            "dependsOn": [
                "[concat(variables('workspaceId'), '/integrationRuntimes/AutoResolveIntegrationRuntime')]"
            ]
        },
        {
            "name": "[concat(parameters('workspaceName'), '/AzureSQLDB')]",
            "type": "Microsoft.Synapse/workspaces/linkedServices",
            "apiVersion": "2019-06-01-preview",
            "properties": {
                "annotations": [],
                "type": "AzureSqlDatabase",
                "typeProperties": {
                    "connectionString": "[parameters('AzureSQLDB_connectionString')]"
                },
                "connectVia": {
                    "referenceName": "AutoResolveIntegrationRuntime",
                    "type": "IntegrationRuntimeReference"
                }
            },
            "dependsOn": [
                "[concat(variables('workspaceId'), '/integrationRuntimes/AutoResolveIntegrationRuntime')]"
            ]
        },
        {
            "name": "[concat(parameters('workspaceName'), '/synapse-managed-rg-WorkspaceDefaultStorage')]",
            "type": "Microsoft.Synapse/workspaces/linkedServices",
            "apiVersion": "2019-06-01-preview",
            "properties": {
                "annotations": [],
                "type": "AzureBlobFS",
                "typeProperties": {
                    "url": "[parameters('synapse-managed-rg-WorkspaceDefaultStorage_properties_typeProperties_url')]"
                },
                "connectVia": {
                    "referenceName": "AutoResolveIntegrationRuntime",
                    "type": "IntegrationRuntimeReference"
                }
            },
            "dependsOn": [
                "[concat(variables('workspaceId'), '/integrationRuntimes/AutoResolveIntegrationRuntime')]"
            ]
        },
        {
            "name": "[concat(parameters('workspaceName'), '/AutoResolveIntegrationRuntime')]",
            "type": "Microsoft.Synapse/workspaces/integrationRuntimes",
            "apiVersion": "2019-06-01-preview",
            "properties": {
                "type": "Managed",
                "typeProperties": {
                    "computeProperties": {
                        "location": "AutoResolve",
                        "dataFlowProperties": {
                            "computeType": "General",
                            "coreCount": 8,
                            "timeToLive": 0
                        }
                    }
                }
            },
            "dependsOn": []
        },
        {
            "name": "[concat(parameters('workspaceName'), '/MongoSQLD')]",
            "type": "Microsoft.Synapse/workspaces/integrationRuntimes",
            "apiVersion": "2019-06-01-preview",
            "properties": {
                "type": "SelfHosted",
                "typeProperties": {}
            },
            "dependsOn": []
        },
        {
            "name": "[concat(parameters('workspaceName'), '/WorkspaceSystemIdentity')]",
            "type": "Microsoft.Synapse/workspaces/credentials",
            "apiVersion": "2019-06-01-preview",
            "properties": {
                "type": "ManagedIdentity",
                "typeProperties": {}
            },
            "dependsOn": []
        },
        {
            "name": "[concat(parameters('workspaceName'), '/OdbcTable1')]",
            "type": "Microsoft.Synapse/workspaces/datasets",
            "apiVersion": "2019-06-01-preview",
            "properties": {
                "linkedServiceName": {
                    "reference

[Click here to create a Pull Request with the proposed solution](https://api.codeautopilot.com/create-pr?task=29593&hash=62d4778ba6ddd82371c1cae5aea998646e9232531cb806214599feec39f95d91)

**Files used for this task:**

<details>
  <summary>Changes on synapse/linkedService/synapse-managed-rg-WorkspaceDefaultSqlServer.json</summary>

  To update the `synapse-managed-rg-WorkspaceDefaultSqlServer.json` file to include the connection details for the Azure SQL Database and ensure that the connection string is securely retrieved from environment variables or GitHub Secrets, follow these steps:

1. **Modify the Connection String**:
   - Update the `connectionString` field to retrieve the connection string from an environment variable or GitHub Secret.

2. **Use Secure Parameters**:
   - Ensure that sensitive information is not hardcoded in the JSON file.

Here is the updated version of the file:

```json
{
    "name": "synapse-managed-rg-WorkspaceDefaultSqlServer",
    "type": "Microsoft.Synapse/workspaces/linkedservices",
    "properties": {
        "typeProperties": {
            "connectionString": {
                "type": "SecureString",
                "value": "@{linkedService().AzureSqlConnectionString}"
            }
        },
        "parameters": {
            "DBName": {
                "type": "String"
            },
            "AzureSqlConnectionString": {
                "type": "SecureString"
            }
        },
        "type": "AzureSqlDW",
        "connectVia": {
            "referenceName": "AutoResolveIntegrationRuntime",
            "type": "IntegrationRuntimeReference"
        },
        "annotations": []
    }
}

Explanation:

Next Steps:

  1. Store the Connection String Securely:

    • Ensure that the AzureSqlConnectionString is stored securely in your environment variables or GitHub Secrets.
  2. Update Deployment Scripts:

    • If you have deployment scripts, make sure they are updated to pass the AzureSqlConnectionString parameter securely.
  3. Documentation:

    • Update your documentation to reflect these changes, ensuring that anyone setting up the environment knows how to securely provide the AzureSqlConnectionString.

By following these steps, you ensure that the connection details for the Azure SQL Database are securely managed and integrated into your Synapse workspace configuration.

Changes on backend-container/flaskapp/config.py To update the `backend-container/flaskapp/config.py` file to include the Azure SQL Database connection details and ensure that the connection string is securely retrieved from environment variables or GitHub Secrets, follow these steps: 1. **Import Required Modules**: Ensure you have the necessary imports for handling environment variables and logging. 2. **Load Environment Variables**: Use `dotenv` to load environment variables from a `.env` file. 3. **Define Configuration Classes**: Update the configuration classes to include the Azure SQL Database connection string. Here is the updated code for `backend-container/flaskapp/config.py`: ```python import os import logging from dotenv import load_dotenv # Custom exception class for configuration errors class ConfigurationError(Exception): pass # Load the environment variables from the .env file dotenv_path = os.path.join(os.path.dirname(__file__), '../.env') load_dotenv(dotenv_path) # Base Configuration class Config: DEBUG = False TESTING = False # Retrieve the Azure SQL Database connection string from environment variables connection_string = os.getenv('AZURE_SQL_CONNECTION_STRING') if not connection_string: raise ConfigurationError( "AZURE_SQL_CONNECTION_STRING must be set in .env or environment variables" ) SQLALCHEMY_DATABASE_URI = connection_string SQLALCHEMY_TRACK_MODIFICATIONS = False LOGGING_LEVEL = logging.ERROR class DevelopmentConfig(Config): DEBUG = True TESTING = True LOGGING_LEVEL = logging.DEBUG class ProductionConfig(Config): DEBUG = False TESTING = False LOGGING_LEVEL = logging.WARNING ``` ### Explanation: 1. **Imports**: The necessary modules (`os`, `logging`, `load_dotenv`) are imported. 2. **Custom Exception**: A custom exception `ConfigurationError` is defined for handling configuration errors. 3. **Load Environment Variables**: The `dotenv` module is used to load environment variables from a `.env` file. 4. **Base Configuration Class**: The `Config` class is defined with the Azure SQL Database connection string retrieved from environment variables. If the connection string is not set, a `ConfigurationError` is raised. 5. **Development and Production Configurations**: Separate classes for development (`DevelopmentConfig`) and production (`ProductionConfig`) configurations are defined, inheriting from the base `Config` class and setting appropriate logging levels. This update ensures that the Azure SQL Database connection string is securely retrieved from environment variables, making the application configuration more secure and flexible.
Changes on backend-container/src/database/connections.py To update the `backend-container/src/database/connections.py` file to include the connection string for the Azure SQL Database, follow these steps: 1. **Import Necessary Libraries**: Ensure you have the necessary libraries imported for connecting to Azure SQL Database. 2. **Retrieve Connection String**: Retrieve the connection string from environment variables or GitHub Secrets. 3. **Establish Connection**: Use the connection string to establish a connection to the Azure SQL Database. Here's the updated code: ```python import os import redis import pyodbc # Import pyodbc for Azure SQL Database connection from dotenv import load_dotenv from src.utils.log import be_logger load_dotenv() def connect_to_redis(primary_host, fallback_host, port=6379): """Establishes a connection to Redis. Args: primary_host: The primary Redis host. port: The Redis port (default is 6379). Returns: A Redis connection object. Raises: redis.exceptions.ConnectionError: If the Redis host is unavailable. """ try: # Try connecting to the primary Redis host r = redis.StrictRedis(host=primary_host, port=port, decode_responses=False) r.ping() return r except redis.exceptions.ConnectionError: # If the primary host is unavailable, try the fallback host try: r = redis.StrictRedis(host=fallback_host, port=port, decode_responses=False) r.ping() return r except redis.exceptions.ConnectionError as ex: # If both connections fail, handle the exception appropriately be_logger.info("Could not connect to Redis: %s", str(ex)) # You may want to raise an exception or handle it in some other way here raise def get_azure_sql_connection(): """Establishes a connection to Azure SQL Database. Returns: A pyodbc connection object representing the connection to Azure SQL Database, or None if the connection could not be established. Raises: pyodbc.Error: If the connection to Azure SQL Database fails. """ try: azure_sql_connection_string = os.environ.get('AZURE_SQL_CONNECTION_STRING') if not azure_sql_connection_string: raise ValueError("AZURE_SQL_CONNECTION_STRING environment variable not set.") connection = pyodbc.connect(azure_sql_connection_string) be_logger.debug("Successfully connected to Azure SQL Database.") return connection except Exception as e: be_logger.error(f"Failed to connect to Azure SQL Database: {e}") return None redis_primary_host = 'redis' # The host used in the development environment redis_fallback_host = 'redis-service' # The host used in the Kubernetes environment redis_port = 6379 # Default Redis port r = connect_to_redis(redis_primary_host, redis_fallback_host, redis_port) # Example usage of the Azure SQL Database connection azure_sql_conn = get_azure_sql_connection() if azure_sql_conn: # Perform database operations here pass ``` ### Explanation: 1. **Import `pyodbc`**: This library is used for connecting to Azure SQL Database. 2. **`get_azure_sql_connection` Function**: This function retrieves the connection string from the environment variable `AZURE_SQL_CONNECTION_STRING` and establishes a connection to the Azure SQL Database. 3. **Error Handling**: Proper error handling is implemented to log any issues that occur during the connection process. ### Next Steps: - Ensure that the `AZURE_SQL_CONNECTION_STRING` environment variable is set in your deployment environment or GitHub Secrets. - Test the connection to verify that it works correctly. This update ensures that the connection string for the Azure SQL Database is securely retrieved and used to establish a connection, aligning with the task requirements.
Changes on README.md To address the task for the `README.md` file, we need to add a detailed step-by-step setup guide for setting up the Azure SQL Database. This guide should include sections for provisioning the database, configuring performance settings, setting up secure access protocols, and storing the connection string in GitHub Secrets. Below is a proposed update to the `README.md` file: ### Updated README.md ```markdown # NFLSTATS Development Environment This is the NFLSTATS project, designed to provide statistical insights for the NFL. ## THIS APP NEEDS A TON OF WORK ## 1. PIPELINES FROM SPORTS RADAR APIS FOR LIVE DATA ## 2. OTHER ADVANCED ANALYTICS DATASOURCES FOR ADDTIONAL ANALYSIS FOR AN AI MODEL ## 3. DATA VISUALIZATION/DISPLAY ## 4. ADDING OTHER SPORTS ## 5. MAKING AI MODELS FOR SPECIFIC GAMES AND THEIR CONFIDENCE INTERVALS IN A PARTICULAR BET This guide will help you set up the environment using GitHub Codespaces or set it up locally using Docker and Docker Compose. Be aware the application is half built right now. It needs quite a bit of TLC. This is my first project, feel free to criticize and make recommendations as you see fit! ## NFLSTATS Development Environment Setup Guide Welcome to the NFLSTATS project! This guide will walk you through setting up your development environment to run the application locally. Before diving into the setup process, ensure you have the following prerequisites installed on your machine: ### Prerequisites - Git: For cloning the repository. Download and install from Git's official site. - Docker: Required for running the application's containers. Download and install Docker from Docker's official site. - Docker Compose: For managing multi-container Docker applications. Typically included with Docker Desktop installations. - Node.js: The runtime environment for the application. Download and install from Node.js's official site. - npm: Comes with Node.js and is used for managing dependencies. - Visual Studio Code (Optional): Recommended IDE for editing and debugging the application. Download from Visual Studio Code's official site. ### Environment Setup #### Clone the Repository First, clone the NFLSTATS repository to your local machine using Git: ```bash git clone https://github.com/[your_username]/nflstats.git cd nflstats ``` Replace `[your_username]` with your GitHub username if you have forked the repository. Otherwise, use the project's main repository URL. #### Docker Environment Ensure Docker and Docker Compose are installed and running on your machine. The application uses Docker containers for both development (docker-compose.debug.yml) and production (docker-compose.prod.yml) environments. #### Node.js and npm Packages After cloning the repository, navigate to the project directory and install the required npm packages: ```bash npm install ``` This command installs all dependencies defined in `package.json`. #### Environment Variables The application requires several environment variables to be set. These variables can be found and modified in the `.env` files located in the project root or specific directories for different environments (e.g., `.env.development`, `.env.production`). Ensure all necessary variables are set before proceeding. ### Setting Up Azure SQL Database To set up a centralized data warehouse using Azure SQL Database, follow these steps: #### 1. Provision Azure SQL Database - Follow Microsoft Azure’s provisioning guide to set up the Azure SQL Database instance. You can find the guide [here](https://docs.microsoft.com/en-us/azure/azure-sql/database/sql-database-get-started-portal). - Choose the appropriate service tier and compute size based on your requirements. #### 2. Configure Performance Settings - Ensure proper scaling and performance configurations. Configure the database for high availability and performance using Azure’s built-in tools. Refer to the [performance best practices](https://docs.microsoft.com/en-us/azure/azure-sql/database/performance-best-practices) for guidance. #### 3. Set Up Secure Access Protocols - Apply best practices for securing Azure SQL databases, including setting up firewalls and access control. Detailed instructions can be found [here](https://docs.microsoft.com/en-us/azure/azure-sql/database/security-best-practices). #### 4. Store Azure SQL Connection String in GitHub Secrets - Store the connection string in GitHub Secrets as `AZURE_SQL_CONNECTION_STRING`. Follow the steps below: 1. Go to your GitHub repository. 2. Navigate to `Settings` > `Secrets` > `New repository secret`. 3. Add a new secret with the name `AZURE_SQL_CONNECTION_STRING` and paste the connection string. ### Running the Application for Development With all prerequisites installed and environment variables set, you can start the development server using npm scripts defined in `package.json`: ```bash npm run start:dev ``` This command executes the `start:dev` script, which builds the application for development and starts the development server. If everything is set up correctly, the application should now be running and accessible. ### Accessing the Application Once the application is running, you can access it by navigating to `http://localhost:3000` in your web browser, assuming the default port configuration. ### Additional Information - **Dockerfiles**: If the Dockerfiles are on `.dockerignore`, ensure to remove them from the ignore list or manually copy them into your project directory. The Dockerfiles are crucial for building the Docker images for both development and production environments. - **Scripts**: For detailed information on available npm scripts and their purposes, refer to the scripts section in `package.json`. These scripts provide shortcuts for common tasks such as building the application, starting the development server, and cleaning the build directory. - **Contributing**: Contributions to the NFLSTATS project are welcome! Please refer to the project's `CONTRIBUTING.md` file for guidelines on how to contribute. By following this guide, you should now have a fully functional development environment for the NFLSTATS project. Happy coding! ``` This update includes a comprehensive guide for setting up the Azure SQL Database, ensuring that all necessary steps are covered.
Changes on synapse-managed-rg/TemplateForWorkspace.json To update the `TemplateForWorkspace.json` file to include the Azure SQL Database as a linked service, follow these steps: 1. **Add Parameters for Azure SQL Database Connection**: - Add new parameters to capture the Azure SQL Database connection string and other necessary details. 2. **Define the Linked Service for Azure SQL Database**: - Add a new resource definition for the Azure SQL Database linked service. Here's the updated `TemplateForWorkspace.json` file with the necessary changes: ```json { "$schema": "http://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#", "contentVersion": "1.0.0.0", "parameters": { "workspaceName": { "type": "string", "metadata": "Workspace name", "defaultValue": "synapse-managed-rg" }, "MongoDBSQLD_connectionString": { "type": "secureString", "metadata": "Secure string for 'connectionString' of 'MongoDBSQLD'" }, "MongoDBSQLD_password": { "type": "secureString", "metadata": "Secure string for 'password' of 'MongoDBSQLD'" }, "synapse-managed-rg-WorkspaceDefaultSqlServer_connectionString": { "type": "secureString", "metadata": "Secure string for 'connectionString' of 'synapse-managed-rg-WorkspaceDefaultSqlServer'", "defaultValue": "Integrated Security=False;Encrypt=True;Connection Timeout=30;Data Source=tcp:synapse-managed-rg.sql.azuresynapse.net,1433;Initial Catalog=@{linkedService().DBName}" }, "AzureSQLDB_connectionString": { "type": "secureString", "metadata": "Secure string for 'connectionString' of 'AzureSQLDB'" }, "MongoDBSQLD_properties_typeProperties_userName": { "type": "string", "defaultValue": "zepor" }, "synapse-managed-rg-WorkspaceDefaultStorage_properties_typeProperties_url": { "type": "string", "defaultValue": "https://loveofdatalakes.dfs.core.windows.net" } }, "variables": { "workspaceId": "[concat('Microsoft.Synapse/workspaces/', parameters('workspaceName'))]" }, "resources": [ { "name": "[concat(parameters('workspaceName'), '/MongoDBSQLD')]", "type": "Microsoft.Synapse/workspaces/linkedServices", "apiVersion": "2019-06-01-preview", "properties": { "annotations": [], "type": "Odbc", "typeProperties": { "connectionString": { "type": "SecureString", "value": "[parameters('MongoDBSQLD_connectionString')]" }, "authenticationType": "Basic", "userName": "[parameters('MongoDBSQLD_properties_typeProperties_userName')]", "password": { "type": "SecureString", "value": "[parameters('MongoDBSQLD_password')]" } }, "connectVia": { "referenceName": "MongoSQLD", "type": "IntegrationRuntimeReference" } }, "dependsOn": [ "[concat(variables('workspaceId'), '/integrationRuntimes/MongoSQLD')]" ] }, { "name": "[concat(parameters('workspaceName'), '/PowerBIWorkspace1')]", "type": "Microsoft.Synapse/workspaces/linkedServices", "apiVersion": "2019-06-01-preview", "properties": { "annotations": [], "type": "PowerBIWorkspace", "typeProperties": { "workspaceID": "E90D9B82-4C5F-40B1-A6D3-2DD8D0A760C4", "tenantID": "364661e2-3ab8-4544-a3ae-fc0b5adb4aaf" } }, "dependsOn": [] }, { "name": "[concat(parameters('workspaceName'), '/synapse-managed-rg-WorkspaceDefaultSqlServer')]", "type": "Microsoft.Synapse/workspaces/linkedServices", "apiVersion": "2019-06-01-preview", "properties": { "parameters": { "DBName": { "type": "String" } }, "annotations": [], "type": "AzureSqlDW", "typeProperties": { "connectionString": "[parameters('synapse-managed-rg-WorkspaceDefaultSqlServer_connectionString')]" }, "connectVia": { "referenceName": "AutoResolveIntegrationRuntime", "type": "IntegrationRuntimeReference" } }, "dependsOn": [ "[concat(variables('workspaceId'), '/integrationRuntimes/AutoResolveIntegrationRuntime')]" ] }, { "name": "[concat(parameters('workspaceName'), '/AzureSQLDB')]", "type": "Microsoft.Synapse/workspaces/linkedServices", "apiVersion": "2019-06-01-preview", "properties": { "annotations": [], "type": "AzureSqlDatabase", "typeProperties": { "connectionString": "[parameters('AzureSQLDB_connectionString')]" }, "connectVia": { "referenceName": "AutoResolveIntegrationRuntime", "type": "IntegrationRuntimeReference" } }, "dependsOn": [ "[concat(variables('workspaceId'), '/integrationRuntimes/AutoResolveIntegrationRuntime')]" ] }, { "name": "[concat(parameters('workspaceName'), '/synapse-managed-rg-WorkspaceDefaultStorage')]", "type": "Microsoft.Synapse/workspaces/linkedServices", "apiVersion": "2019-06-01-preview", "properties": { "annotations": [], "type": "AzureBlobFS", "typeProperties": { "url": "[parameters('synapse-managed-rg-WorkspaceDefaultStorage_properties_typeProperties_url')]" }, "connectVia": { "referenceName": "AutoResolveIntegrationRuntime", "type": "IntegrationRuntimeReference" } }, "dependsOn": [ "[concat(variables('workspaceId'), '/integrationRuntimes/AutoResolveIntegrationRuntime')]" ] }, { "name": "[concat(parameters('workspaceName'), '/AutoResolveIntegrationRuntime')]", "type": "Microsoft.Synapse/workspaces/integrationRuntimes", "apiVersion": "2019-06-01-preview", "properties": { "type": "Managed", "typeProperties": { "computeProperties": { "location": "AutoResolve", "dataFlowProperties": { "computeType": "General", "coreCount": 8, "timeToLive": 0 } } } }, "dependsOn": [] }, { "name": "[concat(parameters('workspaceName'), '/MongoSQLD')]", "type": "Microsoft.Synapse/workspaces/integrationRuntimes", "apiVersion": "2019-06-01-preview", "properties": { "type": "SelfHosted", "typeProperties": {} }, "dependsOn": [] }, { "name": "[concat(parameters('workspaceName'), '/WorkspaceSystemIdentity')]", "type": "Microsoft.Synapse/workspaces/credentials", "apiVersion": "2019-06-01-preview", "properties": { "type": "ManagedIdentity", "typeProperties": {} }, "dependsOn": [] }, { "name": "[concat(parameters('workspaceName'), '/OdbcTable1')]", "type": "Microsoft.Synapse/workspaces/datasets", "apiVersion": "2019-06-01-preview", "properties": { "linkedServiceName": { "referenceName": "MongoDBSQLD", "type": "LinkedServiceReference" }, "annotations": [], "type": "OdbcTable", "schema": [], "typeProperties": {} }, "dependsOn": [ "[concat(variables('workspaceId'), '/linkedServices/MongoDBSQLD')]" ] } ] } ``` ### Explanation of Changes: 1. **Parameters**: - Added a new parameter `AzureSQLDB_connectionString` to capture the connection string for the Azure SQL Database. 2. **Resources**: - Added a new resource definition for `AzureSQLDB` under `Microsoft.Synapse/workspaces/linkedServices`. This includes the connection string and the integration runtime reference. These changes ensure that the Azure SQL Database is included as a linked service in the Synapse workspace, with the necessary configurations for connecting to it.