bjascob / amrlib

A python library that makes AMR parsing, generation and visualization simple.
MIT License
219 stars 34 forks source link

ModuleNotFoundError: No module named 'setup_run_dir' #58

Closed YanzeJeremy closed 1 year ago

YanzeJeremy commented 1 year ago

Hi, Happy New Year, I want to try to train a model, After running 20_Assemble_LDC2020T02/10_CollateData.py and 20_Assemble_LDC2020T02/12_TestLoadPenman.py, I can get

image

And then I want to run 30_Model_Parse_GSII/10_Annotate_Corpus.py but got a ModuleNotFoundError: No module named 'setup_run_dir' error

image

This error repeat like the image above, but actually I can find setup_run_dir.py in 30_Model_Parse_GSII. When I comment these code:

for fn in ('test.txt', 'dev.txt', 'train.txt'): annotate_file(indir, fn, outdir, fn + '.features')

The error disappears but nothing happened, Did I make a mistake in one of my steps, or do you have any possible solutions?

Thanks a lot in advance!

bjascob commented 1 year ago

I don't know why you're getting that error. The best thing to do would be to move the script your running (ie 10_xx) up 2 directories and comment out the line at the top with import setup_run_dir.py. As the comment on that line states, all the import of that module does is to force python into running the script 2 levels up. This is just so I can keep things organized in separate directories. If you manually move the file up 2 directories the paths for amrlib and the data should all be in the right locations without having to do the import.

YanzeJeremy commented 1 year ago

Thank you very much for your reply, I tried to move the script up 2 directories and comment out the line at the top with import setup_run_dir.py. It still fails, with showing the following multiprocessing.pool.RemoteTraceback error, but actually I just run the script on local.

image

How can I deal with this?

bjascob commented 1 year ago

Are you using Windows? Multiprocessing and Windows don't play well together so that might be the issue. Try this... Go to annotator.py and un-comment line 26 then comment out line 25, 27, 29 and 30. This will remove all the multipocessing.Pool() stuff and replace it with a single-threaded map call.

Let me know if that works and if so, I'll try to figure out how to if/else these statements so if anyone else is using Windows they don't run into this again.

YanzeJeremy commented 1 year ago

No , I am using Mac. I will try that now. btw, load_spacy('en') the en model here , does it mean en_core_web_sm model from spaCy or it is a model which should be downloaded following the instruction in SpotlightDBServer.sh ? Do I need to set up the JAVA environment In addition to running requirements.txt ?

YanzeJeremy commented 1 year ago

Oh , it works

image
bjascob commented 1 year ago

On the multiprocessing failure... Interesting. Per the docs... "Changed in version 3.8: On macOS, the spawn start method is now the default. The fork start method should be considered unsafe as it can lead to crashes of the subprocess."

Note to self--> use multiprocessing.get_start_method() == 'fork' to check for this and disable pool code if not equal . There is also os.name and sys.platform if the above causes any issues.

bjascob commented 1 year ago

On Spacy loading... The en model used to be the default model and could change depending on how you setup your system (ie.. it was a pointer to the actual model). You might want to just change this to en_core_web_sm so you know which model it's using.

The SpotlightDBServer is a whole different thing. It's only needed if you want to add the wiki tags to the graphs (done in 32_Add_Wiki.py). This is a separate post-processing operation that most people skip. If you really need this, you'll have to follow the instructions to set it up, including any of the JAVA stuff. Be aware, this is an old system and it may be a pain to get working.

If you really need the wiki tags, I suggest using the newer BLINK system. The script for that can be found in 33_Model_Parse_XFM/30_Add_Wiki_With_Blink.py. It still takes some work to setup but it gives better results.

YanzeJeremy commented 1 year ago

Thank you very much! I understand now