Open gdzqzzx opened 3 years ago
Glad you like it @gdzqzzx !
In answer to your questions, attaching the NVivo file to MS SQL Server shouldn't be difficult and you don't need to convert it into another format since .nvp
files are in fact just renamed .mdf
files. Although you could access the file directly through SQL Server, you would have to deal with NVivo's opaque data structures and miss out on all the hard work I did reverse engineering them!
Based on the error message it seems the SQL Server is unable to access the NVivo file. If you look at mssqlAttach.bat
you'll see that it doesn't attach the file directly but makes a temporary copy of it into the directory %TMP%\SQL
and (in theory) enables full access for all users to that temporary copy. Are you able to locate the file there and check the relevant permissions as discussed on this page: https://stackoverflow.com/questions/18286765/sql-server-operating-system-error-5-5access-is-denied?
Let me know how it goes and I'll try to work out a fix to the scripts.
Thanks for your quick response! I will try it and get back to you soon.
After a whole day testing, I finally attached my nvp file to MS SQL Server 2017.
For those who also want to attach by themselves, here are my experience:
Note1: If you still can't attach your nvp, it might be still because of authentication problem, add a new group including all users under current account, and give this group full control permission.
Note2: If you can attach your nvp, but the data is shown not found, try to open this file in your NVivo once. Then you can find that the detected database name changes and data can be found.
Note3: I am not sure why I can't start my SQL Server Service after I changed TCP port to 1433. So I have to keep using the dynamic port 2851, which can be connected successfully to my DBeaver.
Another annoying issue is that I got is grabled columns in dbo.Item and dbo.Project table. It seems that attributes in dbo.Item are all important, it might be something relate to encoding because I am using a Chinese system, and I still have no idea to fix it.
Those issues with file and directory permissions should be handled by the helper scripts in NVivotools. It would be good to work out why they aren't.
As for the mangled Name and Description fields, this is fairly simple. NVivo just offsets each character but some fixed value. I forget what it was but if you look in the NVivotools code you'll find it.
But what you probably want to do now is to use NVivo tools to process your NVivo file using the DenormaliseNVP.py
NormaliseNVP.py
script.
I have tried to use NVivotools directly. But first failed in SQL Server setting and then not sure what parameter I should set. I will try it again and see how it goes.
@gdzqzzx Did you mean to close the issue?
Yes, because the problem has been solved and actually it's not that relevant to NVivo improvement.
I think it might be good to open another issue if I have any other problems when using NVivotools -- of course, feel free to keep it opened if you want.
Did you work out why the helper scripts didn't fix the file access problems?
Yes, but not all of it, to some extent.
What I fixed 2 days ago is attahing nvp file to SQL Server manually without using NVivotools. Before that, I tried mssqlAttach.py as python mssqlAttach.py -P 2851 -i EXPRESS "xxx.nvp" ntool_test
and got following error:
Traceback (most recent call last):
File "mssqlAttach.py", line 54, in <module>
mssqlAttach(None)
File "mssqlAttach.py", line 49, in mssqlAttach
verbosity = args.verbosity)
File "...\mssqlTools.py", line 74, in __init__
raise RuntimeError('No suitable SQL self.server self.instance found')
RuntimeError: No suitable SQL self.server self.instance found
And I just tried running mssqlAttach.bat directly as mssqlAttach.bat "xxx.nvp" ntool_test SQLEXPRESS
. It works fine this time.
But some import error occured when I ran DenormaliseDB.py, I will open a new issue for that.
I have tested mount function in NVivo.py. It seems that the instance variable somehow wasn't passed correctly into mssqlTools initialization. (Of course, the SQL Server version has been fixed as 'MSSQL14')
I got RuntimeError when I used NVivo.mount(), while mssqlAPI.attach() worked fine separately.
@jschultz Please let me know if you want to do more testing on this issue.
(NormaliseDB.py hasn't worked for me yet, but it seems that it's because I haven't set up the output DB properly. I will test it later.)
I see. The way that the code can end up producing that error message is if the break
command two lines earlier does not execute. It looks like instanceversion
and version
both contain the value MSSQL14
but maybe there is some reason why the test for equality if (not version or instanceversion == version):
is failing. Perhaps you could investigate what is happening there?
I also note that your line numbers differ from mine by one, ie the RuntimeError
is on line 74 for me. Is it possible you have inserted a line somewhere that is somehow messing things up?
Regarding NormaliseDB.py
, while I appreciate your testing all this code, it should still be simpler for you to use NormaliseNVP.py
which handles all the database mounting etc. and produces a SQLite output file.
I remember I switched to NormaliseDB.py
because I stuck in NormaliseNVP.py
, but anyway, now I made it run now and got following outpus and error (line 120 in NVivoNVP.py
was edited as version = ('MSSQL12' if args.nvivoversion == '11' else 'MSSQL14')
):
Using MSSQL instance: SQLEXPRESS
Using port: 2851
Found SQL server instance SQLEXPRESS version MSSQL14
Using MSSQL instance: SQLEXPRESS
Using port: SQLEXPRESS
Attached database nvivo18284
Normalising users
Normalising project
Normalising node categories
Normalising nodes
Normalising node attributes
Normalising source categories
Normalising sources
Traceback (most recent call last):
File "...\sqlalchemy\sql\base.py", line 1156, in __getattr__
return self._index[key]
KeyError: 'ThumbnailLocation'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "NormaliseNVP.py", line 142, in <module>
NormaliseNVP(None)
File "NormaliseNVP.py", line 131, in NormaliseNVP
NVivo.Normalise(args)
File "...\NVivo.py", line 664, in Normalise
nvivoSource.c.ThumbnailLocation if args.nvivoversion == '12' else nvivoSource.c.Thumbnail,
File "...\sqlalchemy\sql\base.py", line 1158, in __getattr__
util.raise_(AttributeError(key), replace_context=err)
File "...\sqlalchemy\util\compat.py", line 211, in raise_
raise exception
AttributeError: ThumbnailLocation
Everything works properly before normalizing Source table. It seems that NVivo12 still uses Tumbnail and Waveform as column names. It may worth double-checking the column names in NVivo12 with this file.
Moreover, it might be good to add NVivo12 into nvivoversion choice in all scripts.
This is the kind of thing that keeps on coming up. At some point I must have found that NVivo version 12 used ThumbnailLocation
and WaveformLocation
, but clearly you are finding otherwise. Those changes were back in 2018 so maybe a more recent update of NVivo changed the column names. Or maybe it is different between Mac and Windows versions. Because NVivo is so opaque you never really know what is going on.
I'd say the best solution would be to test for the presence of the relevant columns rather than use the NVivo version to decide which one to use. If you are able to do this that would be great but otherwise you can just hack the column names to make it work for you.
And yes I agree completely that NVivo12 support should be included throughout. Including denormalisation, ie converting into NVivo format. I just need to find the time, or someone else to do it. :)
I think detecting the relevant columns is a good idea. I will try if I can make it after I make your original script work.
I will keep updating my progress in this issue.
I am happy to announce that I have successfully normalized the nvp file with NormaliseNVP.py.
Besides the conflict on ThumbnailLocation
and WaveformLocation
, I found that there is a bug in NVivo.py. @jschultz
From line 904 to line 910, there should be parentheses for IF ELSE clause as follow:
(
[nvivoAnnotation.c.StartText, nvivoAnnotation.c.LengthText]
if args.mac
else [nvivoAnnotation.c.StartX, nvivoAnnotation.c.LengthX]
)
Otherwise the ELSE condition will ignore the former columns and return StartX
and LengthX
only.
The last thing I want to confirm is whether the condition nvivoNodeReference.c.StartZ.is_(None)
of taggings extraction is still correct. I happened to found that
nvivodb.execute(
select(
[
nvivoNodeReference.c.Id,
nvivoNodeReference.c.Source_Item_Id.label("Source"),
nvivoNodeReference.c.Node_Item_Id.label("Node"),
nvivoNodeReference.c.StartX,
nvivoNodeReference.c.LengthX,
nvivoNodeReference.c.StartY,
nvivoNodeReference.c.LengthY,
nvivoNodeReference.c.StartZ,
nvivoNodeReference.c.CreatedBy,
nvivoNodeReference.c.CreatedDate,
nvivoNodeReference.c.ModifiedBy,
nvivoNodeReference.c.ModifiedDate,
]
+ (
[nvivoNodeReference.c.StartText, nvivoNodeReference.c.LengthText]
if args.mac
else []
)
).where(
and_(
# nvivoNodeReference.c.ReferenceTypeId == literal_column('0'),
nvivoItem.c.Id == nvivoNodeReference.c.Node_Item_Id,
nvivoItem.c.TypeId == literal_column(NVivo.ItemType.Node),
nvivoNodeReference.c.StartZ.is_(None),
)
)
)
returned nothing, but if I remove nvivoNodeReference.c.StartZ.is_(None)
the query have some returns and all of them have StartZ = 8
. Maybe there's no taggings data in my nvp file, but I am still curious what is this NodeReference.StartZ
for?
Fantastic! I'm rather busy at the moment but will endeavour to look more closely at these issues at some point. Are you able to make a pull request for the bug fix? And perhaps check that the same mistake (ie missing parentheses) doesn't appear at other points in the code? As for what StartZ means I really have no idea. Presumably some kind of three dimensional data. Are you able in some way to work out what those NodeReference records with StartZ equal to 8 correspond to in your data? Or if you have a data file that doesn't contain confidential information could you share it?
Yes, I am happy to check the potential bug in other files later. As for the sample data file, it's a pity that I haven't have any desensitization data yet, I will try to figure it out when I have time.
My initial thought is extracting all the codes with corresponding source texts into a plain file like json or csv. For exmple,
{"Source": "text A", "Codes": ["code A", "Code B"]}
I presumed that I need Source
table to implement this idea, but it seems that the number of row in Source
table doesn't have any relationship with the original coding files. Also, I found that the decompressed results of Thumbnail
and Object
columns in Source
table is still not readable.
Does those two columns need further processing for decoding? Or maybe I don't need these columns but need some other tables to implement my data extraction?
The Thumbnail
column is just copied straight from the NVivo column of the same name. I would presume it's some kind of image format (.png
, .jpg
?) but I don't really know much about it.
The Object
column is in a format determined by the column ObjectType
. The possible values are DOC
,ODT
, MP3
, JPEG
and PDF
. But if what you are after is a textual representation of the source content then look in the Content
column.
Producing output as you describe should be pretty simple using Python scripts, or even just SQL.
Yes, that's right. I was confused by the values in Nodes and thought maybe I miss something, but now I find that the norm file should contain all the things I need.
Thanks for your explanation :)
I am happy to announce that I have successfully normalized the nvp file with NormaliseNVP.py.
Besides the conflict on
ThumbnailLocation
andWaveformLocation
, I found that there is a bug in NVivo.py. @jschultzFrom line 904 to line 910, there should be parentheses for IF ELSE clause as follow:
( [nvivoAnnotation.c.StartText, nvivoAnnotation.c.LengthText] if args.mac else [nvivoAnnotation.c.StartX, nvivoAnnotation.c.LengthX] )
Otherwise the ELSE condition will ignore the former columns and return
StartX
andLengthX
only.The last thing I want to confirm is whether the condition
nvivoNodeReference.c.StartZ.is_(None)
of taggings extraction is still correct. I happened to found thatnvivodb.execute( select( [ nvivoNodeReference.c.Id, nvivoNodeReference.c.Source_Item_Id.label("Source"), nvivoNodeReference.c.Node_Item_Id.label("Node"), nvivoNodeReference.c.StartX, nvivoNodeReference.c.LengthX, nvivoNodeReference.c.StartY, nvivoNodeReference.c.LengthY, nvivoNodeReference.c.StartZ, nvivoNodeReference.c.CreatedBy, nvivoNodeReference.c.CreatedDate, nvivoNodeReference.c.ModifiedBy, nvivoNodeReference.c.ModifiedDate, ] + ( [nvivoNodeReference.c.StartText, nvivoNodeReference.c.LengthText] if args.mac else [] ) ).where( and_( # nvivoNodeReference.c.ReferenceTypeId == literal_column('0'), nvivoItem.c.Id == nvivoNodeReference.c.Node_Item_Id, nvivoItem.c.TypeId == literal_column(NVivo.ItemType.Node), nvivoNodeReference.c.StartZ.is_(None), ) ) )
returned nothing, but if I remove
nvivoNodeReference.c.StartZ.is_(None)
the query have some returns and all of them haveStartZ = 8
. Maybe there's no taggings data in my nvp file, but I am still curious what is thisNodeReference.StartZ
for?
Update on NodeReference.StartZ
issue:
In my nvp file, I found that the StartZ
in Annotation
and NodeReference
(Tagging) are all equal to 8. Regarding there is no StartZ
filtering in Annotation
querying, I presume that nvivoNodeReference.c.StartZ.is_(None)
in NodeReference
(Tagging) should be removed, otherwise the most important coding information would be filtered (in my version).
I also found that for Tagging.Fragment
, StartY
is 1-based in NVivo file, while StartX
is 0-based, which means when normalizing my version of nvp file, item['Fragment'] += ',' + str(item['StartY']+1)
should be item['Fragment'] += ',' + str(item['StartY'])
in def build_tagging_or_annotation(item)
(NVivo.py)
I am using NVivo Windows 12.6.0.959.
Thanks @gdzqzzx and sorry I don't have much time right now but I would like to incorporate your work when I get a chance. Is there any chance you could make one or more pull requests reflecting the issues you have found. If you're unsure how to do that there are instructions here. If you don't have the time could you at least commit the changes you have made to your fork of the repo so I can see them all? Thanks!
Yes, I am happy I am able to improve this fabulous tool! I will try to create my pull requests later this week (sorry I can't do it instantly because I don't have enough time currently and it would take me some time to figure out how to do the pulling process).
Hi @jschultz sorry for the delay, I have made the pull request ready for review (hope I did it correctly). Please let me know if there's any other things I can help.
Hi Johnathan @jschultz ,
First of all, thank you so much for developing this fantastic tool to free data from NVivo.
I am currently trying to extract data from NVivo12 (Windows) for my team. Though nvivotools hasn't supported NVivo12 yet, it seems that current mssqlAttach.bat still works in NVivo12 based on #9 . It's not bothering for me to query MS SQL Server directly by python or any other db management tools.
So here is my dummy question: How to attach npv file to SQL Server manually?
My feeble programming skills can only make me identify the key points are in mssqkTools.py and mssqlAttach.bat, but I'm not sure how exactly nvp file is attached to MS SQL Server. I have tried to attach nvp file directly to MS SQL Server, and got following error:
Msg 5120, Level 16, State 101, Server XXXXXXX\SQLEXPRESS, Line 1 Unable to open the physical file "XXXX.nvp". Operating system error 5: "5(Access is denied.)". Msg 1802, Level 16, State 7, Server XXXXXXX\SQLEXPRESS, Line 1 CREATE DATABASE failed. Some file names listed could not be created. Check related errors.
Do I need to convert the nvp file into some other format before attaching it to MS SQL Server?