Time Series Forecasting

soroush-ziaeinejad commented 1 year ago

@Sharjeeliv

Hello Sharjeel, this issue page is created for the task of time series forecasting for user similarity matrices. Here is the detailed explanation for this task: When UML (user modeling layer) is done, we have T different U*U matrices that keep the similarity between U users for T time intervals. Suppose we have 60 time intervals and 1000 users. So we have 60 matrices with shape1000*1000. Your task is to predict the user similarity matrix for time interval T+1 (61 in our case).

Motivation: Our current approach is to generate T graphs from these T matrices (we consider the matrices as adjacency matrices) and then we build T embedded matrices with shape U*d (d=embedding_dimension) such that matrix t is calculated based on matrices 1 to t-1. This is called temporal graph embedding which is done in the graph embedding layer (GEL) in SEERa. The main problems with this approach are:

generating graphs of adjacency matrices is time consuming.
saving the generated graphs in one 3D matrix (T*U*U) is memory and time consuming.
SEERa is going to be a one-stop-shop framework. So we want to add different variations and approaches (baselines) for each task for the sake of comparison and comprehensiveness of our framework.
now we use the latest embedded matrix. Going over the generated embedded matrices may increase the model performance.

Your subtasks:

generate toy datasets: Please write a code that takes T, U, and scenario as inputs and generates T files each containing a U*U matrix as the user similarity matrix, and save them in a folder as well as their corresponding heatmap. The scenario can be one of these three:

increasing similarity -> similarity must be continuously increasing from t=1 till t=T.
decreasing similarity -> similarity must be continuously decreasing from t=1 till t=T.
increasing decreasing similarity -> similarity must be continuously increasing (for half of the users) and continuously decreasing (for the other half of users) from t=1 till t=T.

implement the simplest possible LSTM using Pytorch to be trained on the first k toy time series and predicts the _k+1_th and calculate the prediction accuracy (or error) using the real _k+1_th matrix as the ground truth. In this way, we can gradually see the performance of the LSTM model.
eventually extend the model (in terms of parameters and the number of layers and neurons) to see the effects of each parameter in order to achieve the best model for this task.
train the model on our original dataset (which I will provide later).
tune the model for our dataset to achieve the best results.

Please start with subtask 1 (toy dataset generating) and we will set regular meetings to discuss your findings and make changes if required. If you have any questions about your first subtask (or the whole task) please do not hesitate to contact me or ask for help.

@hosseinfani, Please share your comments and thoughts on this task. It will be really appreciated.

Thanks in advance!

soroush-ziaeinejad commented 1 year ago

Hey @Sharjeeliv ,

Any updates on toy dataset generating? Please let me know once this task is finished. By the way, I forgot to mention that keep the user similarities between 0 and 1.

Thanks :)

Sharjeeliv commented 1 year ago

Hi Soroush, I'm still working on the task (needed to spend time reviewing SEERa's code again). A few initial questions:

Is the method in the main file?
Do you have any suggested readings on libraries or concepts I can do to understand the topic more (I understand the task, but I need to know more about how I can implement it)?

I will have more consistent and frequent updates starting next week.

soroush-ziaeinejad commented 1 year ago

Hi @Sharjeeliv

Yes, reviewing SEERa's code will definitely help. However, the first steps of your task are not directly related to SEERa. Let's focus on your first subtask which is toy dataset generating. We need data to trace the models that we are going to implement in SEERa and we want to generate it in this step.

scenario.1: increasing similarity -> similarity must be continuously increasing from day.1 till day.T
scenario.2: decreasing similarity -> similarity must be continuously decreasing from day.1 till day.T
scenario.3: increasing decreasing similarity -> similarity must be continuously increasing (for half of the users) and continuously decreasing (for the other half of users) from day.1 till day.T

And this is how the data (user similarity matrices) should be look like for 3 users (U=3) for scenario.1 (increasing): day.1.npy = [[1,0.1,0.2] [0.1,1,0.3] [0.2,0.3,1]]

day.2.npy = [[1,0.3,0.4] [0.3,1,0.5] [0.4,0.5,1]]

day.3.npy = [[1,0.5,0.6] [0.5,1,0.7] [0.6,0.7,1]]

day.4.npy = [[1,0.8,0.9] [0.8,1,1] [0.9,1,1]]

Your task is to write a function in Python that takes T, U, and scenario number, and generates T different user similarity matrices as T .npy files and saves them. Please let me know if you need more clarifications.

Sharjeeliv commented 1 year ago

Perfect, thank you. I was confused on how task 1 was linked to SEERa's code, this helped clear it up. I'll have this done asap.

soroush-ziaeinejad commented 1 year ago

Thanks @Sharjeeliv

Also, you can use the simplest way to increase and decrease user similarities. You can generate a random matrix for the first day, then increase it by (1-initial_value)/T for each day. You can use any other approaches that you may prefer. This was just a suggestion.

soroush-ziaeinejad commented 1 year ago

Hello @Sharjeeliv

I am writing to kindly request your prompt attention to the task at hand. We are in need of the output of your first subtask (toy dataset) in order to work on another issue (#71). If you could complete the task as soon as possible, it would be greatly appreciated. Please keep me posted.

Thank you for your cooperation.

Sharjeeliv commented 1 year ago

Yep, I am almost done -just working on some kinks, it will be complete and posted by tonight.

Sharjeeliv commented 1 year ago

Hi @soroush-ziaeinejad,

This is the completed task one, since it's not part of SEERa I posted the code here but I can make a PR with just this file if everything looks good to you. Also for scenario 3, I split the combinations and incremented half and decremented the other half.

This is how the saved files and the dataset looks when it's generated

import os

import numpy as np
from math import ceil

ROUNDING_FACTOR: int = 2

def generate_dataset(time_interval: int, users: int, scenario: int):
    dataset = np.round(np.random.uniform(0.0, 1.0, size=(users, users)), ROUNDING_FACTOR)
    mask = inc_and_dec_scenario_mask(users)
    save_dataset(dataset, scenario, 1)
    print("Original dataset:\n", dataset, "\n")

    for day in range(2, time_interval + 1):  # The loop begins at day 2, as day 1 is the random dataset
        temp = generate_dateset_change(dataset, day)
        # print(f"Change on iteration {day}:\n", temp, "\n")

        if scenario == 1:
            dataset += temp
        elif scenario == 2:
            dataset -= temp
        elif scenario == 3:
            dataset += temp * mask
        else:
            print("Scenario entry must be between 1-3")
            return

        dataset = dataset.clip(0, 1)  # Value must be between 0 and 1
        save_dataset(dataset, scenario, day)
        # print(f"New dataset on iteration {day}:\n", dataset, "\n")

def generate_dateset_change(np_array: np.ndarray, day: int) -> np.ndarray:
    return np.round((1 - np_array[:]) / day, ROUNDING_FACTOR)

def inc_and_dec_scenario_mask(users: int) -> np.ndarray:
    size = users * users
    temp = np.ones(size)
    temp[ceil(size / 2):] *= -1
    return temp.reshape([users, users])

def save_dataset(dataset: np.ndarray, scenario: int, day: int):
    dest = f'../data.toy/scenario.{scenario}'
    if os.path.exists(dest):
        np.save(dest + f'/day.{day:03d}', dataset)
    else:
        os.makedirs(dest)
        np.save(dest + f'/day.{day:03d}', dataset)

if __name__ == '__main__':
    generate_dataset(3, 3, 2)

soroush-ziaeinejad commented 1 year ago

Hey @Sharjeeliv

Thank you for the clean and clear code! I just tested your code and it is exactly what I asked for. It seems we are done with the first subtask. I will provide a description for the second one tomorrow so you can start working on it.

soroush-ziaeinejad commented 1 year ago

@Sharjeeliv ,

There is a bug in the code. These matrices are user similarity matrices and should be symmetric. I think if you initialize symmetric matrices, they remain symmetric until the end. Can you please address this issue and update the code?

Thank you :)

Sharjeeliv commented 1 year ago

@soroush-ziaeinejad,

I changed it to generate like this -this is a quick fix until I can figure out how to natively generate a symmetric matrix. How would you expect it to look for scenario 3? Since the matrix is not evenly divided in half if it is odd.

Scenario 3 - Current

soroush-ziaeinejad commented 1 year ago

Thank you @Sharjeeliv

Do you mean splitting them in "half"? If the problem is this, it's not a strict half, I expect increasing similarity for some users and decreasing for others.

By the way, please keep the diameter of all matrices as 1 because the similarity between a user and herself is always 1 for us. Sorry, I forgot to mention it before.

Please share the code when it's finished.

Sharjeeliv commented 1 year ago

@soroush-ziaeinejad,

So in scenario three if the matrix is no longer symmetric that is ok?

I implemented the changes, here is the new code. Please let me know if anything else needs to be changed.

This is how the a generated matrix looks:

import os

import numpy as np
from math import ceil

ROUNDING_FACTOR: int = 2

def generate_dataset(time_interval: int, users: int, scenario: int):
    dataset = generate_user_similarity_matrix(users)
    mask = inc_and_dec_scenario_mask(users)
    save_dataset(dataset, scenario, 1)
    print("Original dataset:\n", dataset, "\n")

    for day in range(2, time_interval + 1):  # The loop begins at day 2, as day 1 is the random dataset
        temp = generate_dateset_change(dataset, day)
        # print(f"Change on iteration {day}:\n", temp, "\n")

        if scenario == 1:
            dataset += temp
        elif scenario == 2:
            dataset -= temp
        elif scenario == 3:
            dataset += temp * mask
        else:
            print("Scenario entry must be between 1-3")
            return

        dataset = dataset.clip(0, 1)  # Value must be between 0 and 1
        save_dataset(dataset, scenario, day)
        print(f"New dataset on iteration {day}:\n", dataset, "\n")

def generate_user_similarity_matrix(users: int) -> np.ndarray:
    dataset = symmetrize(np.random.uniform(0.0, 1.0, size=(users, users)))
    np.fill_diagonal(dataset, 1)  # A User is similar to themselves
    return np.round(dataset, ROUNDING_FACTOR)

def symmetrize(np_array: np.ndarray)-> np.ndarray:
    return (np_array + np_array.transpose()) / 2

def generate_dateset_change(np_array: np.ndarray, day: int) -> np.ndarray:
    return np.round((1 - np_array[:]) / day, ROUNDING_FACTOR)

def inc_and_dec_scenario_mask(users: int) -> np.ndarray:
    size = users * users
    temp = np.ones(size)
    temp[ceil(size / 2):] *= -1
    return temp.reshape([users, users])

def save_dataset(dataset: np.ndarray, scenario: int, day: int):
    dest = f'../data.toy/scenario.{scenario}'
    if os.path.exists(dest):
        np.save(dest + f'/day.{day:03d}', dataset)
    else:
        os.makedirs(dest)
        np.save(dest + f'/day.{day:03d}', dataset)

if __name__ == '__main__':
    generate_dataset(3, 3, 1)

soroush-ziaeinejad commented 1 year ago

@Sharjeeliv

Hello Sharjeel, this issue page is created for the task of time series forecasting for user similarity matrices. Here is the detailed explanation for this task: When UML (user modeling layer) is done, we have T different UU matrices that keep the similarity between U users for T time intervals. Suppose we have 60 time intervals and 1000 users. So we have 60 matrices with shape10001000. Your task is to predict the user similarity matrix for time interval T+1 (61 in our case).

implement the simplest possible LSTM using Pytorch to be trained on the first k toy time series and predicts the _k+1_th and calculate the prediction accuracy (or error) using the real _k+1_th matrix as the ground truth. In this way, we can gradually see the performance of the LSTM model.

@Sharjeeliv As we move forward, I would like you to focus on the second subtask. You can make use of the code you have already created to generate user similarity matrices for a period of 30 days for 1000 users across all scenarios. This generated dataset will be your starting point.

Next, I would like you to implement a basic LSTM model with a single hidden layer using Pytorch. Train the model using the dataset, and then generate a prediction of another user similarity matrix for the upcoming time interval (31st day in this case).

Please note that I do not expect you to complete this entire subtask within a week, as progress is of utmost importance. Take your time to familiarize yourself with the concepts of Neural Networks and LSTM if you are not already familiar with them. Your findings and progress updates are important to us, so please keep us informed of your progress.

Please let me know if you need help or face any issues during this step.

Thanks

Sharjeeliv commented 1 year ago

Since I'm not familiar with any of the topics in-depth, I'll spend the weekend studying the topics and learning PyTorch. I have found a tutorial on LSTM model with PyTorch so I'll start with that soon after.

Sharjeeliv commented 1 year ago

Hi Soroush, I'll need to revise my original statement -it's going to take me longer than a weekend. I found a good course from Facebook which covers an introduction to neural networks with a focus on recurrent networks (has a section for LSTM) and PyTorch. I'm working through the course and doing the practice they have for PyTorch and NN's.

Course: https://www.udacity.com/course/deep-learning-pytorch--ud188

soroush-ziaeinejad commented 1 year ago

Hi @Sharjeeliv ,

Thanks for the update. That's perfect this course will definitely help to improve your knowledge about NNs and PyTorch. However, we usually don't need the deep theory behind NNs and their variations. Whenever you feel you are dealing with too much information it probably means you are focusing on theoretical details which can be skipped at this step.

Please do not hesitate to let us know if you face any issues or questions. We will try our best to help.

Have an adventurous journey into the NN world!

soroush-ziaeinejad commented 1 year ago

Hi @Sharjeeliv ,

I know this task may be taking longer than the previous one, but I want to encourage you to keep track and report any small progress that you have. Reporting your activities, even if you haven't achieved the desired results, is an essential skill that will benefit you greatly. Don't hesitate to share your understanding, every small experience or experiment you've tried, or any obstacles you've faced. So please keep us updated on your progress, and let us know how we can help.

Thanks :)

Sharjeeliv commented 1 year ago

Whoops, I will post updates more frequently now. This is what I've done/am doing so far and my current understanding:

I covered classification problems and the basics of NN, starting with linear boundaries and how essentially the goal is to separate the points.
Covered perceptrons and how they chain together to create neural networks. Moreover, how perceptrons can be generated and corrected (i.e., learning) by starting at a random place and using a learning rate to move the line towards better accuracy (+ python algorithm for such step)
Moved on to non-linear regions, where the main takeaway was that with the help of an error function, we can divide more complex and realistic "graphs". Where the error function tells us how far we are from the ideal solution -so we take steps to move toward a better solution (gradient descent).
This meant that representation had to change from discrete to continuous so that the algorithm does not get stuck. The method presented was to apply weights and use the total sum of the weights to determine accuracy (where incorrectly classified points have large weights). This continuous error function works with gradient descent now.
Covered discrete and continuous predictions, where continuous is a probability and discrete is like true/false.
Covered how multi-class classification -so if the result can be more than just two things, properties: (1) must add to 1, (2) relationship between classes maintained. This is a softmax function and I coded this in python as per practice.
Other topics: one-hot encoding, max likelihood, reviewed numPy
Covered key formulas throughout (linear, sigmoid, softmax)

For each of the topics, I've been taking notes, doing practice coding, writing formulas, etc. I also ended up learning Latex because writing math notes is awful otherwise.

The main issue I've had is balancing everything with my assignments/courses, so I've been making progress much slower than I would like, but it's been steady, and SEERa is making much more sense to me now. I intend to try to finish Pytorch section before Monday and take a crack at the task, this course does have a topic for LSTM and a section on time-series forecasting, so worst case I'll be able to do the task as I finish that section.

soroush-ziaeinejad commented 1 year ago

@Sharjeeliv

Awesome! This is a perfect sample of a desired report of progress. Please keep going and keep us updated about your new steps. Thank you :)

Sharjeeliv commented 1 year ago

A short pre-update: I'm finishing up what I set out in my last update. Since I mostly make progress on the weekend I'll be *giving my updates every Monday.

*trying to

Sharjeeliv commented 1 year ago

An update on my progress, this content was slower to get through than the earlier section:

This week was mostly spent on cross-entropy, which from my understanding is the connection between probabilities and error functions
We sum the negative natural log of each probability to get us a total probability; if, for example, we have four independent events, this will tell us the error of the total.
Covered several variations of formulas and python implementation.
Moved on to multi-class entropy, for when there are more results than just two
On to logistic regression, which in essence goes like this: take data, pick a random model, calculate the error, and get a better model.
While working on this, I also started to look at tutorials on LSTM for time series forecasting.

The goal is still the same, to finish this content and get started with the task. I am probably going to start on the task in parallel since from the tutorial it just seems like an application.

soroush-ziaeinejad commented 1 year ago

It seems you are getting ready for hands-on experience in deep neural networks! Does the course contain any coding assignments or practicing materials? @Sharjeeliv

Sharjeeliv commented 1 year ago

Updated for brevity and accuracy

Yep, the course has several coding assignments focused on implementing various models (Default, CNN, RNN, etc.) with PyTorch.

As for an update, I’ve finished the core theory, particularly the following topics.

Logistic regression which includes gradient descendent
Neural network architecture
- which is done by combining different models to create more advanced models.
- Can be adjusted based on weights, layers, and input-outputs (By using Softmax we can implement multi-classification)
Other model terms such as feedforward and backward propagation (which is how the learning is done)
I was introduced to testing and training models (both are separate to ensure generality)
Relating to training is over/underfitting; where the former is too specific and the latter not enough.
Epoch is one pass of a training dataset. But instead of running the entire dataset (expensive), we can use sub-sections (stochastic gradient descent) of a dataset.
Some methods to handle fitting are early-stopping and regularization (L1 for feature selection and sparse vectors, L2 for training models)
By using dropout specific parts of the model can be trained
To avoid the local minima issue we can use the following concepts: learning rate decay or momentum

This is a brief summary of the core topics I’ve covered. I’m currently working on the hands-on PyTorch section.

Sharjeeliv commented 1 year ago

Update 1/3 - PyTorch

The following is a final compilation of key points learned or accomplished during the course of the task.

The following are key points learned from the PyTorch course section:

PyTorch provides a high degree of abstraction when working with models. For example, it includes activation functions, simplifies creating models, loss functions, and supplementary features such as saving/loading and testing/training.
Typically models are created as a class derived from nn.module. They can also be created by passing in components to nn.sequential.
The class must have an init and forward function. The former is used to define the model (its structure and inputs like hidden layers), and the latter is used to run inputs through the model -it needs to return an output.
The general training process is a forward pass, using the output to get loss, perform a backward pass + calculate the gradient, and use an optimizer to update weights.
To perform back-propagation the module autograd is used. It automatically calculates gradients by keeping track of operations and basically applying chain rule (not super clear on this)
The criterion in PyTorch is set to a loss function (which is given)
The optim package provides an optimizer which updates the weights of gradients
Epoch is a full pass of the dataset, if it is 5 then the dataset will be run through 5 times. To prevent previous gradients (because they accumulate) we use zero_grad to clear.
Inference is to make a prediction from the model, validation is to test if it is general enough to work on other data. The accuracy of the output compared to the label is used as the measurement metric.
To address this issue, a dropout can be set in the model’s init, to turn it off during testing we use the eval mode.
PyTorch provides a way to save, load to program, and load to model. However, the user must manually store info regarding the structure (e.g., hidden layers) otherwise it will not work.

Sharjeeliv commented 1 year ago

Update 2/3 - RNN & LSTM

The following are key points learned from the RNN course section:

Artificial intelligence > Machine Learning > Deep Learning == Neural Networks > Recurrent Neural Networks > Long Short-Term Memory
The main differentiating factor of an RNN is that it can take and learn from a sequence of data. For example, time-series data can be fully utilized (like that of a stock market)
RNNs are composed of the same parts: weights, biases, activation functions, etc. However, they also have a feedback loop (which lets it take sequential data)
The RNN upon an input returns an output (which can be ignored) and updates the previous part. Hence if we unroll the network it it gives us M “updates” and M+1 inputs and outputs. The weights and biases are shared (for each unroll), so the amount to train does not change regardless of the number of inputs.
It can cause a vanishing or exploding gradient issue, small would become extremely small and large becomes extremely large on each recursion.
The LSTM can avoid this problem by using separate paths for predictions: long and short memory paths.
The LSTM consists of three gates and three lines. The top line is the long-term memory, the middle line is the short-term memory, and the bottom line is input. As inputs are passed they go through different gates which impacts the composition of the memories (e.g., remember the inputs? use it for calculation, etc.). The three gates are called the forget gate, input gate, and output gate. It will reuse the same weights and biases so it can handle sequences of different lengths.
Using a combination of the gates and memories from the lines the LSTM operates the two memories. The practicality of the model is in its ability to consider different inputs differently. For example, the LSTM can use knowledge of day one directly to predict day five instead of relying on day four only.
Whereas the RNN effectively only has short-term memory, it can’t remember from n days in the past. In other words, the preceding data point will have the largest impact on the prediction.

Sharjeeliv commented 1 year ago

Update 3/3 - LSTM Draft Model

This is a refactored and significantly cleaned-up model, the output is still far off from what is expected. Although I was able to understand certain parts of making the model (those that were covered in the course) other parts like the LSTM hidden and cell states were confusing. Moreover getting the proper shapes and arrangement also ended up being trial and error. The training and testing were familiar but still time-consuming to get it to work.

import torch
import torch.nn as nn
import numpy as np

PATH = '/Users/sharjeelmustafa/Documents/02 Work/01 Research/Y3-22F/SEERa/data.toy/scenario.1/day.'

def get_input(t: int):
    sequence = []
    for i in range(1, t + 1):
        data = np.load(f"{PATH}{i:03d}.npy").astype(np.float32)
        # print(f"{torch.tensor(data).view(-1)}\n")
        sequence.append(torch.tensor(data).view(-1))
    return torch.stack(sequence).unsqueeze(1)

if __name__ == '__main__':

    t = 8
    input_dim, n_layers, batch_size, hidden_dim = 9, 1, 1, 9  # Extensive trial and error needed
    num_epochs = 500  # Epochs

    # Define model -shorthand
    lstm = nn.LSTM(input_dim, hidden_dim, n_layers)
    inputs = get_input(t)
    hidden_state = torch.randn(n_layers, batch_size, hidden_dim)
    cell_state = torch.randn(n_layers, batch_size, hidden_dim)

    criterion = nn.MSELoss()  # Error loss function
    optimizer = torch.optim.Adam(lstm.parameters(), lr=0.0001)

    for epoch in range(num_epochs):
        # Prepare for training
        lstm.train()
        optimizer.zero_grad()

        # Needed t odo this otherwise it crashed
        hidden_state = hidden_state.detach()
        cell_state = cell_state.detach()

        # Forward pass - Still unclear on this
        out, hidden = lstm(inputs, (hidden_state, cell_state))
        hidden_state, cell_state = hidden

        # Computing the loss
        target = torch.tensor(np.load(f"{PATH}{t + 1:03d}.npy").astype(np.float32)).view(-1)
        loss = criterion(out[-1, :, :], target)

        # The backpropagation and updating step
        loss.backward()
        optimizer.step()

        print(f"Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item()}")

    # Model for inference i.e., evaluation
    lstm.eval()
    out, _ = lstm(inputs, (hidden_state, cell_state))
    final_output = out[-1, :, :].view(3, 3)
    print(f"Prediction\n{final_output}")

fani-lab / SEERa

Time Series Forecasting #66

Your subtasks: