PPH3 / Eldritch

3 stars 0 forks source link

WordNet #10

Open ebeshero opened 7 years ago

ebeshero commented 7 years ago

David Birnbaum's info on accessing and working with WordNet and synsets: https://github.com/djbpitt/wordnet/blob/master/Wordnet.ipynb

ebeshero commented 7 years ago

@etj27 @mjb232 Evan, here's the info you need to install Python and NLTK (Natural Languages Tool Kit). Matt, I'm cc:ing you on this because I think maybe it'll help if two of you have this installed. You can back each other up.

  1. Install Python: We want Anaconda's latest version of Python 3.5, available here. (Be sure you're getting 3.5 and not the older and not-so-useful-to-us two.oh version.) https://www.continuum.io/downloads If you're on Windows, find the "Download for Windows" tab. (If you're not sure whether to go with the 64-bit or 32-bit installation, read this: https://support.microsoft.com/en-us/help/15056/windows-7-32-64-bit-faq )

  2. Run Python: Here's how to get started: https://opentechschool.github.io/python-beginners/en/getting_started.html Notes:

    • You can actually use oXygen to write a Python program: Go to open a New File and look up "Py". But I usually work with the (free) Sublime Text editor for this.
    • You'll be writing Python scripts that should live in this GitHub repo, and you'll write them to access information from WordNet and based on information you feed in. You'll output it in XML and save probably in the same directory where you write your Python.
    • Basic overview of the process: When you work with Python, first open up a command line prompt (In Windows, use the Windows Logo Button + R, type cmd or cmd.exe and press Enter or click/tap OK.. Navigate into the Eldritch repo and then invoke Python just by typing at the command prompt: python. When you write a script, you'll run it on location just by typing its filename after python: python EldritchPyScript.py.
  3. Download and install NLTK Packages and WordNet: The Anaconda package will come with the toolkit part of NLTK inside it, but you'll still need to import some NLTK data for WordNet. Here's what we need to do:

Here's some backup NLTK reference for us, though it's not geared to the Anaconda installation: http://www.nltk.org/data.html

And here's a book of stuff you can do with NLTK and Python 3.5: http://www.nltk.org/book/

It'll be great if you can install this and start experimenting before we meet again on Friday. I'll go set all this up on my Windows machine at my house so I can help with troubleshooting if needed.

djbpitt commented 7 years ago

As Elisa writes, you can develop Python programs in Sublime text or <oXygen/> or other editing environments. The default IDE (integrated development environment) that comes with most Python distributions is IDLE, but it isn’t the best free IDE available, and I don’t know many persons who use it. The IDE I use and recommend is PyCharm (from JetBrains); you can either use the free “community” edition or apply for a free academic license to use the professional edition.

mjb232 commented 7 years ago

Downloading now!

ebeshero commented 7 years ago

Yay! I'm doing the same on my Windows machine here. Please report any trouble--I'll keep an eye on GitHub...

mjb232 commented 7 years ago

Alrighty, Im good to go for python. Im gonna work on some mark-up before fiddling round with the program though.

ebeshero commented 7 years ago

We're going to need this: http://www.nltk.org/data.html

ebeshero commented 7 years ago

@mjb232 @etj27 I've installed Anaconda Python 3.5 on my Windows computer and discovered that I needed to add some steps to my instructions (above) so we install NLTK data and packages, so we can use WordNet. The instructions above are now revised and ready to go. Let me know if you have any trouble, and let's see what we can do tomorrow with WordNet!

ebeshero commented 7 years ago

@mjb232 So, after you left, I showed Patrick our Python output in synset_count.xml, and it became clear that we really ought to output the synsets for each word in that output. We can modify the Python script, working closely with David Birnbaum's tutorial--see the code featured at In [11]: (scroll down to it in the tutorial: https://github.com/djbpitt/wordnet/blob/master/Wordnet.ipynb and also take a close look at how the output XML is configured in our code.

Something else you need to know: Python code actually depends on indentation to work. Don't try to left-align everything, but follow the way the indentations set related code together. See if you can output all the available synsets for each word and post here with questions for me!

And read in the tutorial about how the identity transformation works to braid our output from accessing WordNet with Python into the formal XML word list we're generating. You'll need to adapt David's code in the tutorial. The part you need is at In [18]: Step 3: To write the counts back into the XML, use an identity transformation, reading in the new count file with the XPath document() function.