Nixtla / nixtla

TimeGPT-1: production ready pre-trained Time Series Foundation Model for forecasting and anomaly detection. Generative pretrained transformer for time series trained on over 100B data points. It's capable of accurately predicting various domains such as retail, electricity, finance, and IoT with just a few lines of code 🚀.
https://docs.nixtla.io
Other
2.17k stars 172 forks source link

[Feature Request: Support for Non-Tabular Data (Images, Text, Sound) in Time-Series Foundation Model] #455

Open linkedlist771 opened 2 weeks ago

linkedlist771 commented 2 weeks ago

Description

I would like to request the extension of the Time-Series Foundation Model to support non-tabular data types, such as RGB images, text, and sound. This would allow the model to handle a broader range of data inputs beyond traditional tabular DataFrame formats.

Background: Currently, the Time-Series Foundation Model is designed to work with data in the form of a DataFrame, which typically represents tabular data with time-series characteristics. While this is suitable for many applications, there are scenarios where the input data is not inherently tabular but could still be valuable for prediction tasks, particularly in the context of video prediction, natural language processing, or audio analysis.

Proposed Solution:

  1. Embedding Transformation Layer: Introduce a preprocessing layer within the model to handle non-tabular data formats. This layer would apply appropriate embedding transformations to convert images, text, or sound into a tensor format compatible with the model's existing architecture.

  2. Flexible Input Interface: Extend the input interface to accept a broader range of data types, including but not limited to:

    • RGB images (e.g., 3D tensors for video frames)
    • Text sequences (e.g., tokenized and embedded representations)
    • Sound (e.g., spectrograms or raw waveforms)
  3. Documentation and Examples: Update the documentation to include examples of how to preprocess and feed non-tabular data into the model, along with best practices for handling different data types.

Benefits:

Conclusion: Implementing support for non-tabular data types in the Time-Series Foundation Model would significantly broaden its applicability and utility across different domains. This enhancement aligns with the growing need for models capable of handling diverse data formats, especially in complex prediction tasks involving video, text, and sound.

Thank you for considering this request.

Use case

elephaint commented 2 weeks ago

Thanks for raising the request @linkedlist771. We are continuously thinking about how to improve the workings of TimeGPT and its potential successors, and supporting multi-modality in some form might be a relevant improvement to make. We'll let you know more once we can be more specific on this.

linkedlist771 commented 2 weeks ago

Thank you for your response, @elephaint. I'm glad to hear that multi-modality support is on your radar for potential improvements to TimeGPT and its successors. I understand that implementing such a feature involves careful consideration of technical feasibility, performance impacts, and integration challenges. I appreciate you taking the time to log this request and consider it in your roadmap discussions. Please let me know if you'd like me to elaborate on any specific aspects of the proposal. I'm looking forward to any updates you can share as your plans in this direction develop. Thank you for your openness to community input and for considering this enhancement.