fastnlp / fastNLP

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
https://gitee.com/fastnlp/fastNLP
Apache License 2.0
3.06k stars 450 forks source link

增加dataset导出为pandas DataFrame功能 #276

Closed onebula closed 4 years ago

onebula commented 4 years ago

Is your feature request related to a problem? Please describe. 问题是什么 dataset很好用,但是没办法像pandas dataframe一样做更复杂的分析。 生成了中间数据、预测结果的时候可能会用到这个功能。

Describe the solution you'd like 解决方案是什么 给dataset增加一个新的api:dataset.to_df 简单看了下代码,大概类似如下操作

import pandas as pd
def to_df(dataset,columns=None):
    # columns 表示需要导出的列名,None表示导出所有列
    data_dict = dataset.get_all_fields()
    if columns is not None:
        data_dict = {name:data_dict[name] for name in columns}
    df = pd.DataFrame({name:data_dict[name].content for name in data_dict})
    return df

Describe alternatives you've considered 其他解决方案 None

Additional context 备注 Add any other context or screenshots about the feature request here.

yhcc commented 4 years ago

最初有考虑,甚至一度想要直接魔改DataFrame作为DataSet。但由于不太想依赖太多python包,于是就搁置了。谢谢你的建议和参考实现,我们会考虑怎么加入进来的,谢谢~