详细教程中《使用Modules和Models快速搭建自定义模型》的数据集问题

jwc19890114 commented 4 years ago

Describe the bug 你好，在学习教程《使用Modules和Models快速搭建自定义模型》的时候，发现在对数据集处理后，无法进行split操作，具体代码在

train_dev_data, test_data = dataset.split(0.1)
train_data, dev_data = train_dev_data.split(0.1)

会提示data_bundle没有split这个方法。我又更换了自己数据集，并使用IMDBLoader来处理，但是在做到IMDBPipe的时候会提示

process() missing 1 required positional argument: 'data_bundle'

这是我的代码，请问我是不是哪里写错了。。。

from fastNLP.io import CSVLoader, IMDBLoader
from fastNLP import Vocabulary, CrossEntropyLoss, AccuracyMetric
# loader = CSVLoader(headers=('raw_sentence', 'label'), sep='\t')
data_bundle=IMDBLoader().load(r'我的imdb数据地址')
print(data_bundle)
print(data_bundle.get_dataset('train')[:3])

from fastNLP.io import IMDBPipe
data_bundle=IMDBPipe.process(data_bundle)

部分数据

A series of escapades demonstrating the adage that what is good for the goose is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story .    1
This quiet , introspective and entertaining independent is worth seeking .  4
Even fans of Ismail Merchant 's work , I suspect , would have a hard time sitting through this one .    1
A positively thrilling combination of ethnography and all the intrigue , betrayal , deceit and murder of a Shakespearean tragedy or a juicy soap opera .    3
Aggressive self-glorification and a manipulative whitewash .    1
A comedy-drama of nearly epic proportions rooted in a sincere performance by the title character undergoing midlife crisis .    4
Narratively , Trouble Every Day is a plodding mess .    1

谢谢你们提供这样的一个库。

yhcc commented 4 years ago

from fastNLP.io import CSVLoader, IMDBLoader
from fastNLP import Vocabulary, CrossEntropyLoss, AccuracyMetric
# loader = CSVLoader(headers=('raw_sentence', 'label'), sep='\t')
data_bundle=IMDBLoader().load(r'我的imdb数据地址')
print(data_bundle)
print(data_bundle.get_dataset('train')[:3])

from fastNLP.io import IMDBPipe
# 应该是这个Pipe没有初始化？
data_bundle=IMDBPipe().process(data_bundle)

你将Pipe初始化后再试一下呢？

jwc19890114 commented 4 years ago

from fastNLP.io import CSVLoader, IMDBLoader
from fastNLP import Vocabulary, CrossEntropyLoss, AccuracyMetric
# loader = CSVLoader(headers=('raw_sentence', 'label'), sep='\t')
data_bundle=IMDBLoader().load(r'我的imdb数据地址')
print(data_bundle)
print(data_bundle.get_dataset('train')[:3])

from fastNLP.io import IMDBPipe
# 应该是这个Pipe没有初始化？
data_bundle=IMDBPipe().process(data_bundle)

你将Pipe初始化后再试一下呢？

可以了，谢谢~ 但是教程里面的那个代码的问题，还需要麻烦看一下

xuyige commented 4 years ago

教程里面的代码是之前采用loader加载数据集并返回dataset的结果现在已经更新为返回DataBundle，后者没有split函数感谢您的问题，我们将会修改对应的教程代码内容

jwc19890114 commented 4 years ago

教程里面的代码是之前采用loader加载数据集并返回dataset的结果现在已经更新为返回DataBundle，后者没有split函数感谢您的问题，我们将会修改对应的教程代码内容

感谢回复，谢谢你们提供这样的库供大家使用

fastnlp / fastNLP

详细教程中《使用Modules和Models快速搭建自定义模型》的数据集问题 #271