TiesdeKok / ipystata

Enables the use of Stata together with Python via Jupyter (IPython) notebooks.
192 stars 67 forks source link
jupyter jupyter-notebook notebook python stata

Combine Python with Stata using IPyStata

:warning:This repository is in archive mode due to my new role. Issues are not monitored.

IPyStata enables the use of Stata together with Python via Jupyter (IPython) notebooks.

Author: Ties de Kok (Personal Page)
PyPi: https://pypi.python.org/pypi/ipystata
Documentation: Example notebook

Table of contents

Get IPyStata

You can install IPyStata 0.3.0 using:

pip install ipystata

Alternative, get 4.X version from Github:

pip install git+https://github.com/TiesdeKok/ipystata

You can update a previous installation using:

pip install ipystata --upgrade --force-reinstall

Dependencies

Python 2.7 or 3.x
IPython 3 or 4+ (http://ipython.org/)
Pandas 0.17.x + (http://pandas.pydata.org/) (I recommend to use a distribution like Anaconda)
Recent version of Stata (13+ preferably) (http://www.stata.com/)

Set up IPyStata

Modes of operation:

IPyStata can communicate with Stata using two different techniques:

  1. Stata Automation, works on --> Windows
  2. Stata Batch mode, works on --> Windows, Mac OS X, and Linux

The Stata Automation mode has a richer feature set compared to the Stata Batch mode.
Unfortunately, Stata only supports Stata Automation for Stata instances running on a Windows OS.

For Windows Stata Automation is set as default but it is possible to set it to use Stata Batch mode instead.
For Unix operating systems (OS X and Linux) it is only possible to use IPyStata in Stata Batch mode.

Windows setup (Stata Automation)

Register your Stata instance:

  1. Go to your Stata installation directory and either:

    • Shift + Right-Click --> click "Open command window here"
      or
    • Open command window (search for "cmd") and type:
      cd C:\Program Files (x86)\Stata14 (Obviously change it to your Stata directory)
  2. Look up the name of your Stata executable (e.g. StataMP-64.exe) and in your command window type:
    StataMP-64.exe /Register

Troubleshooting:

I get a com error when using IPyStata in Stata Automation mode?

IPyStata cannot communicate with Stata. This error indicates that the registration of Stata was unsuccessful.
A potential solution: try to register again but make sure to run the CMD window as administrator.

Do I have to register Stata everytime I want to use IPyStata?

No, you only have to register your Stata instance once unless you want to change your Stata installation.

For more detailed instructions see this page.

Linux / Mac OS setup (Stata Batch mode)

The Batch mode approach works on Windows, Mac OS, and Linux

Set installation directory for Stata:

The first step is to tell IPyStata where it can find the Stata installation, use the following commands:

In[1]: import ipystata  
In[2]: from ipystata.config import config_stata  
In[3]: config_stata('Path to your Stata executable')  

Note: you need to restart the Jupyter Notebook kernel after setting a new Stata installation!

You can find the Stata executable in the installation directory of Stata, for example:

Windows  --> 'C:\Program Files (x86)\Stata14\StataSE-64.exe'
Mac OS X --> '/Applications/Stata/StataSE.app/Contents/MacOS/stataSE'
Linux    --> '/home/user/stata14/stata-se'

Configure IPyStata to use the Stata Batch Mode on Windows:

It is possible to use config_stata to configure IPyStata to use the Stata Batch Mode on Windows instead of the default Stata Automation mode. See the example below:

In[1]: import ipystata  
In[2]: from ipystata.config import config_stata  
In[3]: config_stata('Path to your Stata executable', force_batch=True) 

Note: This is only advisable if you have a portable Stata installation that you cannot register or if you want to use IPyStata on a Windows server.
The Stata Automation method is in most other cases a better option.

Troubleshooting:

I set the installation directory but IPyStata still does not work?

The new Stata installation is only initialized after a complete kernel restart.
A potential solution: in the Jupyter Notebook click kernel --> restart

Do I have to configure my Stata installation everytime I want to use IPyStata?

No, you only have to configure your Stata executable once unless you want to change your Stata installation.

Using IPyStata

Before you get started:

If you use Stata Automation --> Make sure that you have a registered Stata instance: Windows
If you use Stata Batch Mode --> Make sure that you have configured your Stata installation: Unix (Linux, Mac OS)

Using IPyStata:

You can use IPyStata using the %%stata cell magic.
See the basic instructions below or the example notebook.
Example notebook for Mac OS X and Linux users: batch mode notebook

Note: most intermediate files are stored in the .ipython/stata directory.

If you use Stata Automation: Several options are included to manage your sessions, see the session manager section.

Basic instructions:

IPyStata is imported and loaded using import ipystata. A cell with Stata code is defined by the cell magic %%stata.

For example:

In[1]: import ipystata  
In[2]: %%stata  
       display "Hello, I am printed in Stata."  

Arguments:

Send a Pandas dataframe to be used in the Stata session (Both methods):

-d --data  
In[1]: %%stata -d dataframe  

Define the DTA version internally, by default set to 114 (Both methods):

-vr --version  
In[1]: %%stata -d dataframe -vr 118 

Return the dataset from Stata after code execution and load it into a Pandas dataframe (Both methods):

-o --output  
In[1]: %%stata -o dataframe  

Input Python lists and load them into Stata as macros (Both methods):

-i --input  
In[1]: example_list = ['var_1', 'var_2']  
In[2]: %%stata -i example_list  
       display "`example_list'"

Graph will automatically display and multiple graphs are possible (Only for Stata Automation!):
If you want to show multiple graphs, you have to make sure to use the , name(.., replace)argument in your Stata code.
the order is not guaranteed to be the same as the generation order. Recommended to use the title() argument when showing multiple graphs.

It is possible to prevent graphs from showing using the -nogr or --nograph arguments.

If you want a Stata graph as an output of a IPyStata cell you can use the following argument (Only for Stata Batch Mode!):

-gr --graph  
In[1]: %%stata -gr

Note: Graph export is only partially supported by Stata if the OS has no GUI. To work around this problem the figures are shown in PDF-format if you use Stata Batch Mode. See: Statalist

Prevents any output from being shown below the cell (Both methods):

-np --noprint  
In[1]: %%stata -np  

For inspection purposes it is possible to open the Stata window instead of running it quietly (Both methods):

-os --openstata  
In[1]: %%stata -os  

Note: this only works on Windows and Mac OS X.

Define a session to execute the code with (Only for Stata Automation!):

-s --session
In[1]: %%stata -s session_name

(Note: if no session argument is provided the main session is used.)

Set your Python working directory to the Stata session (Only for Stata Automation!) :

-cwd --changewd
In[1]: %%stata -cwd

Set code in the cell to run in Mata (Only for Stata Automation!) :

-m --mata
In[1]: %%stata -m

Retrieve user-defined macros from Stata into a Python dictionary: macro_dict (Only for Stata Automation!):

-gm --getmacro    
In[1]: %%stata -gm macro_1 -gm macro_2  
       local macro_1 item1 item2
       local macro_2 item3 item4
In[2]: macro_dict['macro_1']
In[3]: macro_dict['macro_2']

Set a working directoy to use while executing this cell (Only for Stata Batch Mode!) :

-cwd --changewd
In[1]: %%stata -cwd '~/folder'

Session manager (Stata Automation users only):

IPyStata 0.2 introduces the possibility to use many different Stata sessions that by default run in the background. In order to avoid using unnecessary system resources several tools and automatic cleanup routines are included.

Tools:

Display all active Stata sessions:

In[1]: %%stata
       sessions

Reveal all Stata sessions:

In[1]: %%stata
       reveal all        

Hide all Stata sessions:

In[1]: %%stata
       hide all      

Close all Stata sessions initiated by IPyStata:

In[1]: %%stata
       close

Close all Stata sessions (Warning! This closes all Stata windows):

In[1]: %%stata
       close all

Automatic clean-up routines:

Changelog

What is new in 0.4:

Minor improvements for line width in the log files and added support for UTF-8 encoding in Stata files (requires Pandas 1.0+ to work!).

What is new in 0.3:

The Stata Automation method introduced in IPyStata 0.2 only works on Windows, this release adds support for the Mac OS X and Linux operating systems using the Stata Batch Mode approach.

The execution methods are determined by IPyStata, non-Windows users will automatically use the Stata Batch Mode technique.
For Windows users the default method is Stata Automation, but it is possible to configure IPyStata to use the Stata Batch Mode instead.

What is new in 0.2:

After a discussion with James Fielder I decided to overhaul my initial code to have it interact with Stata using Automation instead of the batch mode. This approach is inspired by James his Stata-Kernel, check out the awesome early development version here: https://github.com/jrfiedler/stata-kernel.

Pros:

  • Extra functionality:
    • Persistent Stata sessions. (Just as-if you were using Stata directly!)
    • Multiple Stata sessions in one notebook.
    • Allows IPystata to retrieve macros directly from Stata into Python.
  • This approach is more idiomatic as it allows for direct interaction with Stata.
  • Keeps my Stata magic functionality consistent with the Stata kernel by James Fiedler.

Cons:

  • Windows only (Stata Automation is Windows only).
  • Requires the user to register their Stata client.
  • Requires recent Stata version (13 / 14).

Bug fixes and other improvements:

  • Improved the output display functionality:
    • Loops should now be displayed correctly.
    • Fixed inconsistent white spaces at the begin / end of output.
  • Internal file-handling changed to using absolute paths, working directory functionality is now explicitly included in the -cwd argument.
  • Package is compatible for both Python 2.7.x and Python 3.x.
  • Plots are now supported using the -gr or --graph arguments (added in 0.2.1)
  • Both IPython 3 and IPython 4 are now supported (added in 0.2.2)
  • Fixed error when replacing dataset in Stata + single item to macro now possible (added in 0.2.3)

Todo:

  • Add an option for non-Windows users that uses the batch mode functionality.
  • Explore the possibilities of asynchronous Stata code execution using different sessions.
  • Improve Stata syntax highlighting.

Syntax Highlighting

Experimental support for Stata syntax highlighting is included. CodeMirror does not have a Stata mode, hence the R mode is modified to accomodate Stata code. Setup instructions are below:

Find your notebook package installation folder. For example:

If you are using IPython 3 go to the folder IPython, for IPython 4 go to the folder notebook:

C:\Users\*User*\AppData\Local\Enthought\Canopy\User\Lib\site-packages\IPython
C:\Users\*User*\Anaconda\Lib\site-packages\IPython

C:\Users\*User*\AppData\Local\Enthought\Canopy\User\Lib\site-packages\notebook
C:\Users\*User*\Anaconda\Lib\site-packages\notebook

In the IPython folder (IPython 3 users) go to the following directory:

\IPython\html\static\components\codemirror\mode

In the notebook folder (IPython 4 users) go to the following directory:

\notebook\static\components\codemirror\mode\

Create a new folder in the "mode" folder called 'stata'

\IPython\html\static\components\codemirror\mode\stata

or

\notebook\static\components\codemirror\mode\stata

Copy stata.js from the ipystata folder (see Github) into the newly created 'stata' folder.

You can then enable syntax highlighting by running the following code in a Jupyter Notebook cell and restarting the kernel:

import ipystata  
from ipystata.config import config_syntax_higlight    
config_syntax_higlight(True)    

Questions?

If you have questions or experience problems please use the issues tab of this repository.
You can also e-mail me at t.c.j.dekok [at] tilburguniversity.edu .

License

MIT - Ties de Kok - 2017

Special mentions

This project is inspired by and based on the excelent work of:

Contributors:
@Pacbard
@bquistorff

Disclaimer

This project is not affiliated with or endorsed by Statacorp.