danielhers / tupa

Transition-based UCCA Parser
https://danielhers.github.io/tupa
GNU General Public License v3.0
72 stars 24 forks source link

Calling parser within code with a pre-trained model cannot find vocab file #65

Closed feralvam closed 5 years ago

feralvam commented 5 years ago

Hi, I want to parse some text using TUPA within some code. After reading some examples around, I am using this:

PARSER_PATH = "resources/ucca/models/ucca-bilstm"
PARSER = None

def get_parser():
    global PARSER
    if PARSER is None:
        PARSER = Parser(PARSER_PATH)
    return PARSER

def ucca_parse_text(text):
    text = ucca.convert.from_text(text, one_per_line=True)
    text = list(text)
    parser = get_parser()
    ucca_passages = [passage for (passage, *_) in parser.parse(text)]
    return ucca_passages

However, I get the following error when testing this code:

....
filename = 'vocab/en_core_web_lg.csv'

    def load_enum(filename):
        if filename == "-":
            return IdentityVocab()
        else:
>           with open(filename, encoding="utf-8") as f:
E           FileNotFoundError: [Errno 2] No such file or directory: 'vocab/en_core_web_lg.csv'

Now, that file does exist. My folder structure is:

resources
|-- ucca
|    |-- models
|    |     |-- ucca-bilstm.data
|    |     |-- ucca-bilstm.enum
|    |     |--  ucca-bilstm.json
|    |     |-- ucca-bilstm.meta
|    |     `-- ucca-bilstm.nlp.json
|    `-- vocab
|          `-- en_core_web_lg.csv    

Maybe it cannot find it because it's a relative path? What would be the best way to fix this? I imagine that one way would be to pass the vocabulary path directly, but how can one do this? should it be inside the config parameter of Parser?

Thanks for the help.

danielhers commented 5 years ago

Hi @feralvam. The vocab file path is specified in <MODEL_FILE>.nlp.json. In your case, in resources/ucca/models/ucca-bilstm.nlp.json. You're right, though, it is a path relative to your current working directory. It would definitely be better if it was relative to the model, but I haven't fixed that yet.

feralvam commented 5 years ago

Hi Daniel, thanks for the quick reply. Unfortunately, changing the path in <MODEL_FILE>.nlp.json didn't solve the problem.

I think that model.load() causes this. Lines 287-290 do the job and read the correct path (resources/ucca/vocab/en_core_web_lg.csv) and set self.config.args.vocab accordingly. However, the loop in lines 293-295 changes the value to vocab/en_core_web_lg.csv. Any idea why this could happen?

danielhers commented 5 years ago

Oh right! That's just because the saved feature param also includes the vocab path as an attribute. Switching the order should fix it. Thanks.

feralvam commented 5 years ago

It seems to be working now. Thanks for the quick fix :)