In the metrics produced by the code(metrics.py) in test, which are used in the paper?

willanxywc commented 4 years ago

The code returns _nf1, em, nem, joint_goal_em, turn_requesr_em., turn_gol_em, avgdialogue. But which of these are used to produce results in Table 3? I notice there are explanations in the appendix, but I can't really match them. For example, which one does turn-based dialogue state EM (dsEM) correpsond to? _avgdialogue?

chho33 commented 4 years ago

Hi, the metric part is from the code of DecaNLP. Line 230-239 in https://github.com/salesforce/decaNLP/blob/c096a1f5bac0f80308dfec103985b15078d1c394/predict.py indicates which dataset corresponds to which metric. {'cnn_dailymail': 'avg_rouge', 'iwslt.en.de': 'bleu', 'multinli.in.out': 'em', 'squad': 'nf1', 'srl': 'nf1', 'sst': 'em', 'wikisql': 'lfem', 'woz.en': 'joint_goal_em', 'zre': 'corpus_f1', 'schema': 'em'} Sorry for inconvinient!

willanxywc commented 4 years ago

Thanks for your timely reply. I get another issue. Why in the forward function of FP16 module the output is converted back to FP32?

class FP16_Module(nn.Module): def init(self, module): super(FP16_Module, self).init() self.add_module('module', module.half()) def forward(self, *inputs, *kwargs): return fp16_to_fp32(self.module((fp32_to_fp16(inputs)), **kwargs))

Worth noting that I get CUDA out of memory error for many times on the above piece of code, fp16_to_fp32(self.module(*(fp32_to_fp16(inputs)), **kwargs)). I have to reduce the memory size and train batchsize to avoid that error. Data parallel doesn't seem to help. I run the code on 4 GeForce 2080 cards and the out of memory error throws out as usual.

chho33 / LAMOL

In the metrics produced by the code(metrics.py) in test, which are used in the paper? #1