fastnlp / fastNLP

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
https://gitee.com/fastnlp/fastNLP
Apache License 2.0
3.06k stars 450 forks source link

详细教程中《使用Modules和Models快速搭建自定义模型》的数据集问题 #271

Closed jwc19890114 closed 4 years ago

jwc19890114 commented 4 years ago

Describe the bug 你好,在学习教程《使用Modules和Models快速搭建自定义模型》的时候,发现在对数据集处理后,无法进行split操作,具体代码在

train_dev_data, test_data = dataset.split(0.1)
train_data, dev_data = train_dev_data.split(0.1)

会提示data_bundle没有split这个方法。 我又更换了自己数据集,并使用IMDBLoader来处理,但是在做到IMDBPipe的时候会提示

process() missing 1 required positional argument: 'data_bundle'

这是我的代码,请问我是不是哪里写错了。。。

from fastNLP.io import CSVLoader, IMDBLoader
from fastNLP import Vocabulary, CrossEntropyLoss, AccuracyMetric
# loader = CSVLoader(headers=('raw_sentence', 'label'), sep='\t')
data_bundle=IMDBLoader().load(r'我的imdb数据地址')
print(data_bundle)
print(data_bundle.get_dataset('train')[:3])

from fastNLP.io import IMDBPipe
data_bundle=IMDBPipe.process(data_bundle)

部分数据

A series of escapades demonstrating the adage that what is good for the goose is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story .    1
This quiet , introspective and entertaining independent is worth seeking .  4
Even fans of Ismail Merchant 's work , I suspect , would have a hard time sitting through this one .    1
A positively thrilling combination of ethnography and all the intrigue , betrayal , deceit and murder of a Shakespearean tragedy or a juicy soap opera .    3
Aggressive self-glorification and a manipulative whitewash .    1
A comedy-drama of nearly epic proportions rooted in a sincere performance by the title character undergoing midlife crisis .    4
Narratively , Trouble Every Day is a plodding mess .    1

谢谢你们提供这样的一个库。

yhcc commented 4 years ago
from fastNLP.io import CSVLoader, IMDBLoader
from fastNLP import Vocabulary, CrossEntropyLoss, AccuracyMetric
# loader = CSVLoader(headers=('raw_sentence', 'label'), sep='\t')
data_bundle=IMDBLoader().load(r'我的imdb数据地址')
print(data_bundle)
print(data_bundle.get_dataset('train')[:3])

from fastNLP.io import IMDBPipe
# 应该是这个Pipe没有初始化?
data_bundle=IMDBPipe().process(data_bundle)

你将Pipe初始化后再试一下呢?

jwc19890114 commented 4 years ago
from fastNLP.io import CSVLoader, IMDBLoader
from fastNLP import Vocabulary, CrossEntropyLoss, AccuracyMetric
# loader = CSVLoader(headers=('raw_sentence', 'label'), sep='\t')
data_bundle=IMDBLoader().load(r'我的imdb数据地址')
print(data_bundle)
print(data_bundle.get_dataset('train')[:3])

from fastNLP.io import IMDBPipe
# 应该是这个Pipe没有初始化?
data_bundle=IMDBPipe().process(data_bundle)

你将Pipe初始化后再试一下呢?

可以了,谢谢~ 但是教程里面的那个代码的问题,还需要麻烦看一下

xuyige commented 4 years ago

教程里面的代码是之前采用loader加载数据集并返回dataset的结果 现在已经更新为返回DataBundle,后者没有split函数 感谢您的问题,我们将会修改对应的教程代码内容

jwc19890114 commented 4 years ago

教程里面的代码是之前采用loader加载数据集并返回dataset的结果 现在已经更新为返回DataBundle,后者没有split函数 感谢您的问题,我们将会修改对应的教程代码内容

感谢回复,谢谢你们提供这样的库供大家使用