BarraQDA / nvivotools

A range of tools to help you get more out of NVivo(tm)
GNU General Public License v3.0
46 stars 11 forks source link

NVivotools

A range of tools to help you get more out of NVivo(tm).

Thanks to the wonderful Wooey some of these tools are now available for use on our server at http://wooey.barraqda.org.

Introduction

The core of NVivotools its ability to convert qualitative research data (sources, nodes, coding, etc.) into and out of NVivo's proprietary format. Some reasons why you might want to do this include:

  1. Freeing your work. Make your research data available to whomever your want (including your future self), not only those with their own current NVivo licence.

  2. Choose the tools you want to manipulate your data. NVivo's GUI isn't bad, but sometimes you'd prefer to be able to automate. Use some of the plethora of data management tools or your own coding skills to take charge of your data.

  3. Interface with the rest of your IT world. Make NVivo part of your tookit, not your whole world.

The core of NVivotools is its ability to make sense of NVivo's proprietary file structures. These files are, in face, relational database. The Windows version uses Microsoft SQL Server while the Mac version uses SQL Anywhere. NVivotools is able to minimise the difficulties of working with different database engines by using SQLAlchemy.

Installation

If you want to access NVivo for Mac files (and this may be sufficient for you even if you ultimately want to use NVivo for Windows, since NVivo for Windows is able to import and export NVivo for Mac files), then installation is quite simple and straightforward. If, on the other hand, you wish to access NVivo for Windows files, then you will need an appropriately configured instance of the correct version of Microsoft SQL Server which in turn requires a Windows machine. You only need read the correspondingly titled of the following sections:

NVivo for Mac on Mac

If you have NVivo installed, then NVivotools will automatically use the instance of SQL Anywhere that is bundled with it. Otherwise, read the following section for instructions on installing SQL Anywhere.

NVivo for Mac on other operating systems

Since NVivo for Mac (.nvpx) files are actually SQL Anywhere databases, they can be accessed on any computer on which SQL Anywhere can be installed. This includes Linux (for x86, x64 and ARM), Mac and Windows, plus Solaris SPARC and x64, HP-UX Itanium and IBM AIX. The Developer Edition is available free of charge (subject to licence conditions, which it is your responsibility to comply with). Simply download and install it, and you are ready for the next step.

NVivo for Windows on Windows

NVivo files for Windows (extension .nvp) are simply Microsoft SQL Server files. NVivotools works with them just as NVivo does, by attaching them to SQL Server. Unlike NVivo, NVivotools (or more precisely SQLAlchemy) uses something called Tabular Data Stream (TDS) to communicate with SQL server. This approach has the advantage of abstracting the database access so that NVivotools does not need to know too much about the messy details of SQL Server. It does, however, mean that SQL Server needs to be set up to allow TDS connections.

Start by accepting that SQL Server, like pretty much everything Microsoft creates, is awful. Each version differs, often in subtle and undocumented ways, that make it incompatible with previous versions. Installation and configuration often require a GUI, making them impossible (or very difficult) to automate. However, SQL Server does have many enthusiastic users who write prolifically about their experiences. So if you have trouble with any of this, the first place to look for help is on the web by googling the text of any error message you need to investigate or concise description of a problem you encounter.

Install Microsoft SQL Server

The good news is that if you have installed NVivo 10 for Windows and only wish to access NVivo 10 for Windows files, then it will already have installed Microsoft SQL Server for you. Remember that your named instance of Microsoft SQL Server is QSRNVIVO10, and proceed straight to the next step.

In every other situation, you will need to install Microsoft SQL Server. To access NVivo 10 files without installing NVivo 10, you need to download and install Microsoft SQL Server 2008 R2 Express. To access NVivo 11 files, whether or not you have installed NVivo, you need Microsoft SQL Server 2014 Express.

As part of the Microsoft SQL Server installation, you will be asked to name the server instance. It matters little what name you give, and the default name (MSSQLSERVER or SQLEXPRESS) should be fine. However, if you are going to use NVivotools for both NVivo 10 and NVivo 11 files, and need both versions of Microsoft SQL Server, it may avoid later confusion if you include the version numer (2008R2 or 2014) in the instance name. In any case, remember the name you give each instance as you will need it later.

Set up SQL Server

NVivotools accesses SQL Server using TDS, which operates over the network protocol TCP/IP. This means that you need to configure SQL Server to allow access over TCP/IP. It may be possible to do this using the command line, but I found it simpler to use the SQL Server Configuration Manager, which you'll find from the Start Menu in the folder for the relevant version of SQL Server. When you find it you need to:

1. Enable TCP/IP connections

In the left panel of the SQL Server Configuration Manager click on Protocol for <Instance name> under SQL Server Network Configuration or SQL Server Network Configuration (32bit) and find a list of protocol names. The one you want is TCP/IP. Right-click on this one, then click on Properties. Under the Protocol tab you need to change the value of Enabled to Yes. Then go to the IP Addresses tab, scroll to the bottom of the list of values until you find a header 'IPAll'. Expand this heading by clicking on it until you see the value TCP Port underneath it. The default port number for TDS is 1433, and it is simplest to use this value. However, if you are going to use NVivotools with more than one Microsoft SQL Server instance, you will need to assign different port numbers to each instance. It doesn't matter too much which port number you use, as long as it isn't already in use, so choosing consecutive numbers like 1433 and 1434 might be helpful. In any case, be sure to note and remember the port numbers you select.

Don't close the Configuration Manager just yet, as you'll need to use it to restart the server a few steps further on.

2. Configure SQL Server authentication

Microsoft SQL Server is able to use two different kinds of authentication to control access to its databases. The default setting is to only allow 'Windows authentication'. And you guessed it, we need the other kind 'SQL Server authentication'. To configure the server to allow both kinds of authentication, you need to make a small change to the Windows registry. There are a variety of ways of doing this; I will only describe the most standard way of doing so using the registry editor regedit.

Run regedit.exe from the Start Menu by typing regedit into the Search box. You will need to authorise changes to the system - don't be too alarmed, you are in total control of any changes so as long as you are careful and/or follow these instructions closely no harm will result. That said, no guarantees!

Using the left pane in the regedit window, navigate to HKEY_LOCAL_MACHINE -> SOFTWARE -> Microsoft -> Microsoft SQL Server -> MSSQL10_50.QSRNVIVO10 -> MSSQLServer. Once again if you are using a different version of NVivo or Microsoft SQL Server these names (especially MSSQL10_50.QSRNVIVO10 may vary). When you get there, you will see a list of values in the right pane. Look for LoginMode; right-click on it, select Modify and change the value to 2.

3. Create an account ('login' in MSSQL parlance)

Start the SQLCMD program from the command line as follows:

sqlcmd -S LOCALHOST\QSRNVIVO10

Then enter the following commands:

create login nvivotools with password='nvivotools'
go
sp_addsrvrolemember nvivotools,sysadmin
go

Some sources suggest that you may need to restart the server with ;-T7806 appended to the command line. (And some people still take Microsoft seriously?) I haven't always found this necessary but if you have trouble then it may be worth trying.

4. Restart server

Another piece of Microsoft brilliance - you can't request that the server simply read a new network configuration - you have to restart the whole thing. Back at the SQL Server Configuration Manager window, click on 'SQL Server Services' in the left frame, then right-click on the relevant server instance name and select Restart.

More Information

Here are a few links that describe other ways of configuring the SQL Server authentication.

NVivo for Windows on other operating systems

Since the only part of accessing NVivo for Windows project files that requires Windows is Microsoft SQL Server, and the server is in any case accessed using the TCP/IP network protocol, it is relatively (everything is relative) simple to run the rest of NVivotools on a different computer running a different (read: better) operating system like Linux or MacOS. To do this, you need to install and configure Microsoft SQL Server as described above, the follow a few more steps to allow remote access:

1. Punch a hole in the Windows firewall

If you want to use NVivotools from a different computer than the one running SQL Server (I do this so that I can keep as far away from Windows as possible, but you may find other reasons to do so) you'll need to tell the firewall to allow incoming network connections to SQL Server. You'll need to find the SQL Server executable (something like C:\Program Files\Microsoft SQL Server\MSSQL10_50.QSRNVIVO10\MSSQL\Binn\sqlservr.exe) and configure the Microsoft Firewall to allow connections to that program. Alternatively, you can simply open the relevant port number(s) (from the Microsoft SQL Server TCP/IP configuration above) and allow connections on those ports.

2. Allow SSH access to the host computer

Since NVivotools also needs to copy files and run certain commands on the host computer (the one running Microsoft SQL Server), it needs to be able to gain access using the Secure Shell (SSH) protocol. Moreover, NVivotools needs to be able to connect to the host without prompting for a password. This requires that the server be set up to allow key-based authentication by creating an authorized_keys file. Once again Microsoft makes this process mind-bogglingly fragile but I have found the instructions here to be useful.

Install Python

With the deprecation of Python 2 and the migration of all the required libraries to Python 3, NVivotools now requires Python 3.

Windows

Install a recent version of Python 3 from Python Releases for Windows. During the installation process you will be asked whether to add Python to the path - say 'Yes' to keep things simple.

Linux

Use your usual package manager to install Python 3.

Mac

Install a recent version of Python 3 from Python Releases for Mac OS X.

Sidenote: Although OSX ships with a version of Python, this version seems to be unable to work correctly with the SQLAlchemy package on which NVivotools depends (more precisely - if anyone knows enough about this stuff to figure it out - it fails to load the dbcapi). There is some suspicion that this problem may be related to OSX's System Integrity Protection (SIP), which was only introduced with El Capitan. It is therefore possible that the following section may not be required on earlier versions of OSX (or if you disable SIP, which we do not recommend).

Update: Since High Sierra, SIP makes life even more complciated by dumping changes to DYLD_LIBRARY_PATH required to find SQLAlchemy's libraries. It seems that this problem can be avoided/deferred by (temporarily) shifting the working directory to the one containing the libraries. I've introduced code to do this and as of May 2019 it was still working.

Install Python libraries

If you don't already have pip, you'll need to download and run it from here.

You may also need to upgrade your pip:

pip install --user --upgrade pip

The easy way

The quickest, and sometimes the easiest, way to install the required Python libraries is to use the requirements.txt file.

pip install --user -r requirements.txt

or, if you only want to install the libraries system-wide and have the required user privileges:

pip install -r requirements.txt

If these fail (a strong possibility) you will likely have to proceed to the following, which describes how to install only those libraries that you're particular installation will need.

The complicated way

Install the required modules:

pip install --user dateutils future pdfminer Pillow sqlalchemy python-dateutil

If you plan to access NVivo for Windows files (whether on a Windows or other machine), you will also need

pip install --user pymssql

while if you are going to access NVivo for Mac files, you will need

pip install --user git+git://github.com/BarraQDA/sqlalchemy-sqlany

Note that the above command installs BarraQDA's own fork of sqlalchemy-sqlany, pending the merging of this pull request.

User abers found a problem on Raspberry Pi (possibly other ARM systems) where the pymssql library requires other packages (freetds-common, libsybdb5) to be installed. This problem was resolved by installing those packages using the package manager (eg apt-get for Debian-based systems) before using pip to install pymssql.

Optional extras

In order to convert data among various formats, NVivotools uses a number of helper applications. If you plan to use NVivotools to load sources into your NVivo projects, you will need the following:

LibreOffice/OpenOffice and unoconv

If you are going to import textual data into NVivo, you will need either LibreOffice or OpenOffice. Although NVivotools already includes unoconv, there are some incompatibility between the version of Python (or more precisely the pyuno library shipped with LibreOffice/OpenOffice) which may force you to either install an older version of LibreOffice (this appears to be the case on Mac) or a different version of unoconv (this seems to happen under Linux).

Linux

Use your usual package manager to install one of LibreOffice or OpenOffice, plus unoconv. The distribution should take care of Python compatibility issues by installing Python 3 if required.

Windows

Install one of LibreOffice or OpenOffice by following the links from the website. The version of unoconv shipped with NVivotools seems to work fine under Windows.

Mac

According to this report you need to install an older version of LibreOffice in order for unoconv to work.

Use

The core of NVivotools is its ability to transform data into and out of NVivo's proprietary formats. A number of scripts are provided for use in different operating system environments and provide different degrees of control over the process. All of the scripts have a basic usage guide which can be seen by calling them with no argument.

The most generic scripts are NormaliseDB.py and DenormaliseDB.py, which take an input and output descriptor (in sqlalchemy format, eg sqlite:///filename.db or mssql+pymssql://user:password@sqlservername/database), and convert the former to the latter. However most users are likely to prefer to use dedicated scripts for converting NVivo for Windows (.nvp) or for Mac (.nvpx) files.

NEW Convert between NVivo and RQDA

Scripts to convert to and from RQDA The following scripts are available for converting between NVivo and RQDA formats. As usual, calling the script with no arguments prints the usage, and scripts whose name contains NVP refer to the NVivo for Windows and need to be run under Windows. The other scripts can be run on Mac, Linux or any similar machine with SQL Anywhere installed.

Windows

Two scripts are provided specifically for Windows: NormaliseNVP.py and DenormaliseNVP.py. These two scripts transform an NVivo project from an .nvp file into, and out of, a normalised project (.norm) file respectively.

Mac and Linux

The scripts NormaliseNVPX.sh and DenormaliseNVPX.sh transform an NVivo project from an .nvpx file into, and out of, a normalised project (.norm) file respectively. Note that you need to call these sh scripts rather than the equivalently named Python scripts - this is because SQLAnywhere requires certain environment variables to be set before any database work can be done.

What can you do now?

Once your research data is freed from the clutches of NVivo, you are limited only by your imagination! Here are some that come to mind:

  1. Load data into your project. Use scripts including editProject.py, editNode.py and so forth to build your project. It's much less tiresome and error-prone than NVivo's GUI, you can also repeat the process as many times as you need to get it right.

  2. Extract data from your project. Coming soon: scripts to do this for you.

  3. Automate your coding. See the script textblobExampleCode.py for a simple example of the use of Python's Natural Language Toolkit to produce nodes and code sources at those nodes. This blog entry shows the result of applying the script to NVivo's sample project.