AlfredCYL / gplearn_cross_factor

Enhance the gplearn package to support precise three-dimensional structured dimension genetic programming (GP), with a particular focus on enabling cross-sectional factor analysis within the package.
MIT License
26 stars 5 forks source link

why you have extracted so many duplicated equations #3

Open fusying-hwang opened 7 months ago

fusying-hwang commented 7 months ago

why you have extracted so many duplicated equations? Is this an expected phenomenon?

AlfredCYL commented 6 months ago

I’ve repeated these equations with some fixed parameters (for example, using data from 5, 10, 20 days to calculate means, standard deviations, and so forth) to avoid the problem of overfitting. To be honest, this method is somewhat crude and heavy-handed. It bloats the project and affects the likelihood of an operator being chosen. Actually, you can make the project more flexible by modifying the function template to allow for more parameters for each operator. By doing so, you can set some fixed time intervals and let the program randomly choose one.

By the way, modifying the function template is quite necessary. Besides solving the problem you mentioned, it also adds more functionality. For example, you can check the input types of different data to decide whether an operator is applicable.

I hope this helps. I am also improving my project, and I plan to add these capabilities in future versions. Please look forward to it.

fusying-hwang commented 6 months ago

Hi, Alfred, Thanks for the reply, it really gives me some thoughtful ideas. I am also trying to add more basic operators. Maybe we can connect if u r interested.