XanaduAI / QHack2023

QHack 2023
68 stars 15 forks source link

[Done] Quantum Temporal Convolutional Networks and Their Application to Weather Forecasting #89

Open cyx617 opened 1 year ago

cyx617 commented 1 year ago

Project Name:

Quantum Temporal Convolutional Networks and Their Application to Weather Forecasting

Team Name:

cyx

Which challenges would you like to submit your project for?

Hybrid Quantum-Classical Computing Challenge Quantum computing today! NVIDIA Challenge Amazon Braket Challenge

Project Description:

Weather forecasting becomes more and more crucial for national economy and people’s daily life. There is also an increasing demand for information on weather changes in various industrial areas (e.g. wind power generation). However, weather forecasting is a very challenging task due to the complexity of atmospheric system and noisy weather data. To tackle this problem, we propose in this project a novel quantum machine learning model called quantum temporal convolutional network (QTCN), which is inspired by the classical temporal convolutional network (TCN) [1]. The core element of our QTCN model is the quantum dilated convolution (QDC) [2] which combines the concept of dilated convolution with variational quantum circuits. Compared to the standard quantum convolution with the same kernel size, QDC is able to capture a larger receptive field without introducing more learnable parameters, and generally results in a feature map of a smaller size, reducing the number of quantum circuit executions. In addition, we also propose a quantum Squeeze-and-Excitation (QSE) module to further improve the model performance. Based on channel-wise attention, the QSE module can help our QTCN model focus on more important channels. We perform experiments using one internal desensitized dataset which includes 1,000 time series of five meteorological indicators observed from 2018-01-01 to 2020-12-31. We demonstrate our model can achieve promising performance in terms of the pre-defined accuracy (see our presentation). In particular, we find that the residual blocks, which are adopted by most of state-of-the-art deep learning models, are also essential for QML models as we observe a huge performance gap between our models with and without this architecture. This observation demonstrates one of the core elements of QML, which is to bring insightful ideas from classical machine learning to quantum computing. Moreover, we observe a faster model convergence when training the model with the device of “lightning.qubit”simulator than “default.qubit”simulator. To our best knowledge, the model proposed in this project is the first quantum version of TCN models and it has enormous potential to handle a wide range of time series problems.

TCN-based models with small kernel size (e.g. 2, 3) are usually adopted for time series problems. This is because small kernels can help the model learn short-range dependencies. Long-range dependencies can be captured by stacking a set of dilated convolution layers with different dilation rates. The same thing happens to our QTCN model. Therefore, one of the advantages of our QTCN model is that it allows for small quantum circuits which can be run reliable on NISQ devices and also fit on local quantum simulators without the memory explosion problem. However, this also gives rise to the drawback of the QTCN model. We trained the model with both remote simulators (e.g. SV1) and real devices (e.g Lucy from Oxford Quantum Circuits) but the training time was unrealistically long, even for the case of using the adjoint differentiation method for SV1. This is mainly due to the network latency to run a massive amount of small quantum circuits which are required in the quantum convolution process. We also trained via Cyxtera Run:ai our model with the“lightning.gpu”simulator which can leverage the powerful NVIDIA GPU. Unfortunately, the training time was still too long and it was even much longer than local simulators. This problem stems from the memory transfer overhead between the CPU and GPU. One possible option to mitigate this drawback is to reduce the number of quantum convolution layers of our model (e.g. replacing the first QDC layer with a classical dilated convolution layer). In addition, we actually optimized our code in the second stage of the project by removing all for loops in the quantum convolution computation process and leveraging quantum parameter broadcasting supported by PennyLane. The optimized code worked perfectly (at least for the PennyLane version 0.24.0) and helped reduce the model training time, as shown in our presentation.

[1] S. Bai, J. Z. Kolter, and V. Koltun, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,”arXiv preprint arXiv:1803.01271, 2018 [2] Y. Chen, “Quantum dilated convolutional neural network,”IEEE Access, 2022

Project Link:

https://github.com/cyx617/QTCN/tree/dbf649283d53b53cdef7a775e5c9162891477a63

JordanAWS commented 1 year ago

Hi cyx, it looks like you didn't use Amazon Braket in your project -- is that correct?

cyx617 commented 1 year ago

Hi @JordanAWS, thanks for asking. I actually used Amazon Braket for the project. However, I received the AWS credits a bit late (on Feb 28, 2023 GMT+8). So I did not have enough time for all the experiments I would like to conduct. For example, I did not make it to try training my models with Amazon Braket Hybrid Jobs. I trained the model using both remote simulators (e.g. SV1) and real devices (e.g Lucy from Oxford Quantum Circuits) via an Amazon Braket notebook instance. I found the training time was very long (even for one epoch), as I mentioned in the project description, So I did not obtain the results and update them in the presentation file. I guess this long training time was due to the large number of small quantum circuit executions in training the model. I plan to try Amazon Braket Hybrid Jobs this weekend and hope to see some exciting results!

截屏2023-03-04 23 25 49