AlfredCYL / gplearn_cross_factor

Enhance the gplearn package to support precise three-dimensional structured dimension genetic programming (GP), with a particular focus on enabling cross-sectional factor analysis within the package.
MIT License
26 stars 5 forks source link

how to produce X.npy and Y.npy files #2

Open lyw2000china opened 8 months ago

lyw2000china commented 8 months ago

Hi, @AlfredCYL . I like your project because I am planning to do something similar. It would be helpful if you could share some code files about data preprocessing. For example, how to produce X.npy and Y.npy files. Thank you.

AlfredCYL commented 6 months ago

Consider you only have the open, close, high, and low prices as your inputs. For each of them, you have a dataframe for n days and p stocks. The forward returns have the same shape. Then, for Y.npy, you just need to transform it into a numpy array (simply use pd.DataFrame.values()). For X.npy, you should process each data point in the same way as Y, and then use the np.concatenate function to merge them into the final numpy array.

I hope it helps. Feel free to ask if you still have problems.

leonyvon commented 4 months ago

Consider you only have the open, close, high, and low prices as your inputs. For each of them, you have a dataframe for n days and p stocks. The forward returns have the same shape. Then, for Y.npy, you just need to transform it into a numpy array (simply use pd.DataFrame.values()). For X.npy, you should process each data point in the same way as Y, and then use the np.concatenate function to merge them into the final numpy array.假设您只有开盘价、收盘价、最高价和最低价作为输入。对于它们中的每一个,您都有一个 n 天和 p 个股票的数据框。远期回报具有相同的形状。然后,对于 Y.npy,您只需将其转换为 numpy 数组(只需使用 pd.DataFrame.values())。对于X.npy,您应该以与Y相同的方式处理每个数据点,然后使用np.concatenate函数将它们合并到最终的numpy数组中。

I hope it helps. Feel free to ask if you still have problems.我希望它有帮助。如果您还有问题,请随时询问。

For the Y, I wonder the index are symbols or columns are symbols?

AlfredCYL commented 1 month ago

Consider you only have the open, close, high, and low prices as your inputs. For each of them, you have a dataframe for n days and p stocks. The forward returns have the same shape. Then, for Y.npy, you just need to transform it into a numpy array (simply use pd.DataFrame.values()). For X.npy, you should process each data point in the same way as Y, and then use the np.concatenate function to merge them into the final numpy array.假设您只有开盘价、收盘价、最高价和最低价作为输入。对于它们中的每一个,您都有一个 n 天和 p 个股票的数据框。远期回报具有相同的形状。然后,对于 Y.npy,您只需将其转换为 numpy 数组(只需使用 pd.DataFrame.values())。对于X.npy,您应该以与Y相同的方式处理每个数据点,然后使用np.concatenate函数将它们合并到最终的numpy数组中。 I hope it helps. Feel free to ask if you still have problems.我希望它有帮助。如果您还有问题,请随时询问。

For the Y, I wonder the index are symbols or columns are symbols?

Sorry for the late response. Y is (n_dates, n_stocks); X is (n_dates, n_features, n_stocks).