[Feature Request: Support for Non-Tabular Data (Images, Text, Sound) in Time-Series Foundation Model]

linkedlist771 commented 2 months ago

Description

I would like to request the extension of the Time-Series Foundation Model to support non-tabular data types, such as RGB images, text, and sound. This would allow the model to handle a broader range of data inputs beyond traditional tabular DataFrame formats.

Background: Currently, the Time-Series Foundation Model is designed to work with data in the form of a DataFrame, which typically represents tabular data with time-series characteristics. While this is suitable for many applications, there are scenarios where the input data is not inherently tabular but could still be valuable for prediction tasks, particularly in the context of video prediction, natural language processing, or audio analysis.

Proposed Solution:

Embedding Transformation Layer: Introduce a preprocessing layer within the model to handle non-tabular data formats. This layer would apply appropriate embedding transformations to convert images, text, or sound into a tensor format compatible with the model's existing architecture.
Flexible Input Interface: Extend the input interface to accept a broader range of data types, including but not limited to:
- RGB images (e.g., 3D tensors for video frames)
- Text sequences (e.g., tokenized and embedded representations)
- Sound (e.g., spectrograms or raw waveforms)
Documentation and Examples: Update the documentation to include examples of how to preprocess and feed non-tabular data into the model, along with best practices for handling different data types.

Benefits:

Expanded Use Cases: Enabling support for diverse data types would allow the model to be applied in a wider range of applications, including video prediction, NLP, and audio processing tasks.
Streamlined Workflow: By directly supporting images, text, and sound, users would be able to leverage the model more effectively without needing to manually transform these data types into a tabular format.
Improved Flexibility: This feature would make the model more versatile, accommodating various data modalities in time-series prediction tasks.

Conclusion: Implementing support for non-tabular data types in the Time-Series Foundation Model would significantly broaden its applicability and utility across different domains. This enhancement aligns with the growing need for models capable of handling diverse data formats, especially in complex prediction tasks involving video, text, and sound.

Thank you for considering this request.

Use case

RGB Images: In video prediction tasks, input frames are often RGB images. While these images can be converted into a numerical format, direct support for image tensors would be more efficient and flexible.
Text: Natural language data could benefit from being processed directly by the model, especially when considering sequential or time-dependent text data.
Sound: Audio signals are another form of time-series data that could be more naturally handled by the model if direct support for waveform or spectrogram inputs were available.

elephaint commented 2 months ago

Thanks for raising the request @linkedlist771. We are continuously thinking about how to improve the workings of TimeGPT and its potential successors, and supporting multi-modality in some form might be a relevant improvement to make. We'll let you know more once we can be more specific on this.

linkedlist771 commented 2 months ago

Thank you for your response, @elephaint. I'm glad to hear that multi-modality support is on your radar for potential improvements to TimeGPT and its successors. I understand that implementing such a feature involves careful consideration of technical feasibility, performance impacts, and integration challenges. I appreciate you taking the time to log this request and consider it in your roadmap discussions. Please let me know if you'd like me to elaborate on any specific aspects of the proposal. I'm looking forward to any updates you can share as your plans in this direction develop. Thank you for your openness to community input and for considering this enhancement.

Nixtla / nixtla

[Feature Request: Support for Non-Tabular Data (Images, Text, Sound) in Time-Series Foundation Model] #455

Description

Use case