attardi / deepnl

Deep Learning for Natural Language Processing
GNU General Public License v3.0
457 stars 116 forks source link

NER Tagger object serialization issue #11

Closed kiran-surya closed 9 years ago

kiran-surya commented 9 years ago

Hi,

NER Tagger object is not getting serialized using pickle, jsonpickle or dill. Is this some known issue ?

kiran-surya commented 9 years ago

NerTagger has load api. Is there anyway to dump the NerTagger object (as trained model will not be changed )? This is required if we do not want to load tagger object for every test example that comes online.

kiran-surya commented 9 years ago

I'm using save api. But, I'm getting following error: Traceback (most recent call last): File "bin/dl-ner.py", line 390, in main() File "bin/dl-ner.py", line 348, in main tagger.save(args.outputfile) File "deepnl/tagger.pyx", line 142, in deepnl.tagger.Tagger.save (deepnl/tagger.cpp:3863) self.nn.save(file) File "deepnl/network.pyx", line 302, in deepnl.network.Network.save (deepnl/network.cpp:7254) self.p.save(file) File "deepnl/networkseq.pyx", line 64, in deepnl.networkseq.SeqParameters.save (deepnl/networkseq.cpp:3300) pickle.dump([self.hidden_weights, self.hidden_bias, TypeError: argument must have 'write' attribute

kiran-surya commented 9 years ago

Save function is used for serializing NerTagger object. How to deserialize it ? Pickle is not working.

attardi commented 9 years ago

Try Tagger.load(file).

kiran-surya commented 9 years ago

You mean NerTagger.load(file) ?? want to avoid this for every test example.

attardi commented 9 years ago

Once you have loaded the tagger, you can call it many times with new sentences. If you want a service to which you can send sentences for tagging, you need top wrap it in a web service. I do it using Tornado.

kiran-surya commented 9 years ago

I'm using flask. I want to call tagger for every sentence, but i want to load tagger only once. I get sentences online. I tried to serialize tagger, but none of pickle, jsonpickle, dill are working. In the following code: with open(args.model) as file: tagger = NerTagger.load(file) reader = ConllReader() for sent in reader: sent = [x[args.formField] for x in sent] # extract form ConllWriter.write(tagger.tag(sent))

I want to call load only for the first time. In subsequent calls to dl-ner, tagger object should be deserialized and used. But tagger object deserialization using dill/pickle/jsonpickle is not working. Do you have any idea ?

attardi commented 9 years ago

It looks right. Create a new reader for reading the next sentences. You can peruse the tagger object as many times you want the method tagger.tag().

kiran-surya commented 9 years ago

But how to save the tagger object for subsequent calls ?? My problem is tagger object is not getting serialized

attardi commented 9 years ago

You don't need to serialize it. It is already in memory.

kiran-surya commented 9 years ago

Actually, i'm calling dl-ner from web server using subprocess.

kiran-surya commented 9 years ago

Thanks for your help. The issue is resolved now.