IntelLabs / nlp-architect

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks
https://intellabs.github.io/nlp-architect
Apache License 2.0
2.94k stars 447 forks source link

Runtime errors when running with the ABSA model #167

Closed stevesolun closed 4 years ago

stevesolun commented 4 years ago

I am trying to run the ABSA solution on my local Ubuntu 18.04 even with the provided train reviews dataset but getting the following error (using the exact instructions + virtualenv): 'file_name': ['tripadvisor_co_uk-travel_restaurant_reviews_sample_2000_train.csv']}}], 'references': []}: OSError('Unable to load model. Filepath is not an hdf5 file (or h5py is not available) or SavedModel.',) When I am trying to load a bunch of reviews, I get the following error:

'references': []}: ParserError('Expected 2 fields in line 2, saw 3')

Please advise how to fix it.

danielkorat commented 4 years ago

Hi, I'm testing it now on Ubuntu 18.04, will let you know

stevesolun commented 4 years ago

@danielkorat I have done some debugging and found that the problem/issue is in the following function:

def train_file_callback(attr, old, new):
        print("point17 - train_file_callback")
        global train_data
        SENTIMENT_OUT.mkdir(parents=True, exist_ok=True)
        train = TrainSentiment(parse=True, rerank_model=None)
        if len(train_src.data['file_contents']) == 1:
            print("point17.1 - train_file_callback")
            train_data = read_csv(train_src, index_cols=0)
            print("point17.1.1 - train_file_callback")
            file_name = train_src.data['file_name'][0]
            print("point17.1.2 - train_file_callback")
            raw_data_path = SENTIMENT_OUT / file_name
            print("point17.1.3 - train_file_callback")
            train_data.to_csv(raw_data_path, header=False)
            print(f'Running_SentimentTraining on data...')
            print(raw_data_path)
            train.run(data=raw_data_path)
            print("point17.2 - train_file_callback")

It can't proceed to point17.2 and it fails on the train.run(data=raw_data_path)

I am also getting the following error from Bokeh:

ERROR:bokeh.server.protocol_handler:error handling message Message 'PATCH-DOC' (revision 1) content: {'events': [{'kind': 'ModelChanged', 'model': {'type': 'ColumnDataSource', 'id': '1049'}, 'attr': 'data', 'new': {'file_contents': ['data:text/csv;base64,Ii1GYW50YXN0aWMhIC1CZWVuIG1lYW5pbmcgdG8gdmlzaXQgRWxsaW90J3MgZm9yIGFnZXMsIGl0cyBlaXRoZXIgY2xvc2VkIG9yIGZ1bGwgZXZlcnkgdGltZSBJIGhhdmUgdHJpZWQgdG8gZ28gaW4gdGhlIHBhc3QuIC1CZWluZyBKYW51YXJ5IGFuZCBsdW5jaHRpb......

If we will dig deeper the actual problem is here:

def run(self, data: Union[str, PathLike] = None, parsed_data: Union[str, PathLike] = None,
            out_dir: Union[str, PathLike] = TRAIN_OUT):
        print("point17 - train.run")
        if not parsed_data:
            print("point17.1 - train.run")
            if not self.parser:
                raise RuntimeError("Parser not initialized (try parse=True at init)")
            print("point17.2 - train.run")
            parsed_dir = Path(out_dir) / 'parsed' / Path(data).stem
            print("point17.3 -train.run")
            parsed_data = self.parse_data(data, parsed_dir)
            print("point17.4 -train.run")
        generated_aspect_lex = self.acquire_lexicon.acquire_lexicons(parsed_data)
        print("point17.5 - train.run")
        _write_aspect_lex(parsed_data, generated_aspect_lex, LEXICONS_OUT)
        print("point17.6 - train.run")
        generated_opinion_lex_reranked = self.rerank.predict(AcquireTerms.acquired_opinion_terms_path,
                                                             AcquireTerms.generic_opinion_lex_path)
        print("point17.7 - train.run")
        _write_opinion_lex(parsed_data, generated_opinion_lex_reranked, LEXICONS_OUT)
        print("point17.8 - train.run")
        return generated_opinion_lex_reranked, generated_aspect_lex

The following line

generated_opinion_lex_reranked = self.rerank.predict(AcquireTerms.acquired_opinion_terms_path,
                                                             AcquireTerms.generic_opinion_lex_path)

doesn't work, it doesn't proceed to print("point17.7 - train.run") in ABSA solution train.py file (from nlp_architect.models.absa.train.train import TrainSentiment)

stevesolun commented 4 years ago

Another potentially useful info is that the rerank_model.h5 file contains:

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>921D8DBE191CE539</RequestId><HostId>XtEQOlCo9t/moznF/tUc7wlhJKPpXxNvGhcATdr8369njD8mJOBM9RfUI+4SQaXJEY2CzqogT60=</HostId></Error>

Path to file: /home/my_user/nlp-architect/cache/absa/train/reranking_model

peteriz commented 4 years ago

Please update your branch/working copy to the latest from master branch. We had to update our public URLs (models weight files). Thanks


From: Steve Solun notifications@github.com Sent: Sunday, September 20, 2020 2:33:51 PM To: NervanaSystems/nlp-architect nlp-architect@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [NervanaSystems/nlp-architect] Runtime errors when running with the ABSA model (#167)

Another potentially useful info is that the rerank_model.h5 file contains:

<?xml version="1.0" encoding="UTF-8"?>

AccessDeniedAccess Denied921D8DBE191CE539XtEQOlCo9t/moznF/tUc7wlhJKPpXxNvGhcATdr8369njD8mJOBM9RfUI+4SQaXJEY2CzqogT60=

Path to file: /home/my_user/nlp-architect/cache/absa/train/reranking_model

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/NervanaSystems/nlp-architect/issues/167#issuecomment-695776689, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AABYYTAHE6775GJL4XEALW3SGXSB7ANCNFSM4ROSOZHA.

Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.

stevesolun commented 4 years ago

@peteriz I am sorry, still doesn't work. Please provide a step by step tutorial how can I do this.

I have cloned the project from master, tried to run ui.py in solutions/absa_solution but getting:

user@user: my_path/nlp-architect$ python3 solutions/absa_solution/ui.py 
Opening Bokeh application on http://localhost:5006/
INFO:bokeh.server.server:Starting Bokeh server version 1.2.0 (running on Tornado 6.0.4)
INFO:bokeh.server.tornado:Torndado websocket_max_message_size set to 5242880000 bytes (5000.00 MB)
2020-09-20 16:23:50,942 ERROR Uncaught exception GET / (::1)
HTTPServerRequest(protocol='http', host='localhost:5006', method='GET', uri='/', version='HTTP/1.1', remote_ip='::1')
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/tornado/web.py", line 1703, in _execute
    result = await result
  File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 742, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/usr/local/lib/python3.7/dist-packages/bokeh/server/views/doc_handler.py", line 55, in get
    session = yield self.get_session()
  File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 735, in run
    value = future.result()
  File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 742, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/usr/local/lib/python3.7/dist-packages/bokeh/server/views/session_handler.py", line 77, in get_session
    session = yield self.application_context.create_session_if_needed(session_id, self.request)
  File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 735, in run
    value = future.result()
  File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 748, in run
    yielded = self.gen.send(value)
  File "/usr/local/lib/python3.7/dist-packages/bokeh/server/contexts.py", line 215, in create_session_if_needed
    self._application.initialize_document(doc)
  File "/usr/local/lib/python3.7/dist-packages/bokeh/application/application.py", line 178, in initialize_document
    h.modify_document(doc)
  File "/usr/local/lib/python3.7/dist-packages/bokeh/application/handlers/function.py", line 133, in modify_document
    self._func(doc)
  File "solutions/absa_solution/ui.py", line 55, in _doc_modifier
    grid = _create_ui_components()
  File "solutions/absa_solution/ui.py", line 216, in _create_ui_components
    with open(join(SOLUTION_DIR, "dropdown.js")) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/my_path/nlp-architect/nlp_architect/solutions/absa_solution/dropdown.js'
stevesolun commented 4 years ago

Finally fixed this, needed to copy solutions dir to nlp-architect/nlp-architect/

Please make it easier to future users.

I am testing the solution now, please don't close the issue. Thanks.

danielkorat commented 4 years ago

@stevesolun Thanks for noticing the issue I'm working on merging the absa branch to the master branch to solve it

danielkorat commented 4 years ago

@stevesolun PR-168 fixes the issues you mentioned and will be merged soon

stevesolun commented 4 years ago

@danielkorat thanks :)

I will be happy to check it out when merged.

stevesolun commented 4 years ago

And please update the docker to apply all the changes. I am sure people will be happy to work with a docker solution.