Open liudonglei opened 6 years ago
(castor) [ldl@402 sm_cnn 15:15:35] $ python train.py --mode static --no_cuda
Dataset TREC Mode static
VOCAB num 13
LABEL.target_class: 13
LABELS: ['
Hey @liudonglei To my understanding, you are using your own dataset, right ? Can you post your dataset format in this thread? It will be more easier for me to understand this issue.
@Impavidity Not my own dataset, I just try the sm_cnn model on TrecQA dataset in your Castor-data repo, My all steps follow the steps in Castor/README.md and Castor/sm_cnn/README.md
Hi @liudonglei, were you able to resolve this issue? I am facing the same issue.
Hi @liudonglei, were you able to resolve this issue? I am facing the same issue.
Sorry, I can't, I am unfamiliar with the torchtext package this repo used.
@rosequ @SawanKumar28 Hi, today i try this repo again and fix this problem, this problem come from the file trec_dataset.py to use the torchtext.data.TabularDataset. I don't know why, That maybe some bug of Python's class inheritance. after debugging half day, I locate the file trec_dataset.py and borrow the similar code from BLOG http://mlexplained.com/2018/02/08/a-comprehensive-tutorial-to-torchtext to make the repo works.
you can just replace the trec_dataset.py with the bellow code:
----the right trec_dataset.py file ---- from torchtext import data
class TrecDataset: dirname = 'data' @classmethod def splits(self, question_id, question_field, answer_field, external_field, label_field):
tv_datafields = [('qid', question_id), ('label', label_field), ('question', question_field),
('answer', answer_field), ('ext_feat', external_field)]
train, dev, test = data.TabularDataset.splits(
path="data", # the root directory where the data lies
#train='train.csv', validation="valid.csv",
train='trecqa.train.tsv', validation='trecqa.dev.tsv', test='trecqa.test.tsv',
#train='ttt.csv', validation='ttt.csv', test='ttt.csv',
format='tsv',
#skip_header=True, # if your csv header has a header, make sure to pass this to ensure it doesn't get proceesed as data!
fields=tv_datafields)
return train, dev, test
$ python train.py --mode static --gpu 1', '2', '0', '7', '3', '1', '8', '4', '5', '9', '6', '\t', '.']
Train instance 53417
Dev instance 1148
Test instance 1517
Shift model to GPU
Time Epoch Iteration Progress (%Epoch) Loss Dev/Loss Accuracy Dev/Accuracy
Traceback (most recent call last):
File "train.py", line 147, in
for batch_idx, batch in enumerate(train_iter):
File "/home/dm/anaconda3/envs/theano.3/lib/python3.6/site-packages/torchtext/data/iterator.py", line 151, in iter
self.train)
File "/home/dm/anaconda3/envs/theano.3/lib/python3.6/site-packages/torchtext/data/batch.py", line 27, in init
setattr(self, name, field.process(batch, device=device, train=train))
File "/home/dm/anaconda3/envs/theano.3/lib/python3.6/site-packages/torchtext/data/field.py", line 188, in process
tensor = self.numericalize(padded, device=device, train=train)
File "/home/dm/anaconda3/envs/theano.3/lib/python3.6/site-packages/torchtext/data/field.py", line 308, in numericalize
arr = self.postprocessing(arr, None, train)
File "/home/dm/anaconda3/envs/theano.3/lib/python3.6/site-packages/torchtext/data/pipeline.py", line 37, in call
x = pipe.call(x, args)
File "/home/dm/anaconda3/envs/theano.3/lib/python3.6/site-packages/torchtext/data/pipeline.py", line 52, in call
return [self.convert_token(tok, args) for tok in x]
File "/home/dm/anaconda3/envs/theano.3/lib/python3.6/site-packages/torchtext/data/pipeline.py", line 52, in
return [self.converttoken(tok, *args) for tok in x]
File "train.py", line 62, in
postprocessing=data.Pipeline(lambda arr, , train: [float(y) for y in arr]))
File "train.py", line 62, in
postprocessing=data.Pipeline(lambda arr, _, train: [float(y) for y in arr]))
ValueError: could not convert string to float: ''
Note: You are using GPU for training Dataset TREC Mode static VOCAB num 13 LABEL.target_class: 13 LABELS: ['