Nike-Inc / timeseries-generator

A library to generate synthetic time series data by easy-to-use factors and generator
Apache License 2.0
138 stars 33 forks source link

[Feature request] Customizable feature combinations #3

Open athewsey opened 2 years ago

athewsey commented 2 years ago

Hi team, Thanks for the useful library! I wonder if you'd be open to this idea:

I would like to be able to:

Today, it seems like Generator.generate() hard-codes the assumption that time-series should be generated for the product of all provided feature values.

It'd be helpful if, instead, we could have the option of customizing this join to limit down generated combinations?

Some options I can think of:

  1. Leave the library as-is: Users generate full outer product and limit down what they want in post-processing
    • This seems possible already, but very RAM-intensive if your desired combinations are sparse?
  2. Accept an optional dataframe of factor combinations as parameter to the generate() method
    • Gives full flexibility over which combinations are kept / ignored, without assuming any particular rigid hierarchies between features
    • ...But might need to do a bit of validation to protect against user errors? May not be super easy to use without some documented examples / functions to generate the dataframe
  3. Some more complex API for feature configuration that accommodates specifying valid/invalid feature combinations
    • Might be nicer for usability, but difficult to make general: E.g. a straightforward hierarchy could be represented as a nested dict, but in practice many applications have multiple intersecting views of product category information e.g. brand, type, target segment, etc.
ymwdalex commented 2 years ago

@athewsey thanks for your great feature request, and some implementation suggestions! Personally, I like option 3, but as you said, it is not an easy one to make general.

However, I am quite busy recently, and will not have time to work on it in next few months. Feel free to improve the package if you have ideas and time.