NTMC-Community / MatchZoo

Facilitating the design, comparison and sharing of deep text matching models.
Apache License 2.0
3.84k stars 897 forks source link

Preprocessor.fit_transform does not initialise preprocessor.context #764

Open MathildaSu opened 5 years ago

MathildaSu commented 5 years ago

Describe the bug

When calling preprocessor = mz.preprocessors.DSSMPreprocessor() train_processed = preprocessor.fit_transform(train_pack) the preprocessor does not automatically initialise preprocessor.context like when calling

train_processed = preprocessor.fit(train_pack)

To Reproduce

import matchzoo as mz

import pandas as pd
path = "/results/DPH_3.res" #any file 
table = pd.read_csv(path,sep='\t')
df = pd.DataFrame({  #any format
        'text_left': table['q'],
        'text_right': table['doc'],
        'id_left': table['q_id'],
        'id_right': table['doc_id'],
        'label': table['label']
})

pack = mz.pack(df)

train_pack = pack[:10000]
valid_pack = pack[10000:15000]
predict_pack = pack[15000:20000]

preprocessor = mz.preprocessors.DSSMPreprocessor()
preprocessor.fit_transform(train_pack)
print(preprocessor.context) #output is {}

preprocessor.fit(train_pack)
print(preprocessor.context) #output is not empty, all params are initialised

train_processed = preprocessor.transform(train_pack)
valid_processed = preprocessor.transform(valid_pack)
predict_processed = preprocessor.transform(predict_pack)

Describe your attempts

Current workaround: Separately perform preprocessor.fit() and preprocessor.transform()

Context

uduse commented 5 years ago

Since I don't have your data, I tested it with our toy data. I could not reproduce the bug you are reporting.

Here's the thing I tried:

import matchzoo as mz
pp = mz.preprocessors.DSSMPreprocessor()
dp = mz.datasets.toy.load_data()
pp.fit_transform(dp)
print(pp.context)  # actually prints correctly fitted context
pp.fit(dp)
print(pp.context)  # prints the same thing