Closed rorydavidson closed 6 years ago
Hi Marcelo,
The Neo4J Snomed database loader scripts were contributed by by a third party, see attribution here - https://github.com/rorydavidson/SNOMED-CT-Database/tree/master/NEO4J#for-attribution
I am not familiar with this load script of the Python language but it appears from your stack trace that the script failed on line 198 of snomed_g_graphdb_build_tools.py which is attempting to decode the --neopw64 parameter.
I can see in the README that in the examples given they have not shown how to quote the Neo4J password. As a workaround you could edit your copy of the script and insert your password directly to script.
Replace line:
if opts.mode=='build': neo4j = snomed_g_lib_neo4j.Neo4j_Access(base64.decodestring(opts.neopw64))
with:
if opts.mode=='build': neo4j = snomed_g_lib_neo4j.Neo4j_Access("YOUR_PASSWORD")
If you have to make this sort of workaround it's best practice to not commit your password into any public git repository.
I will leave this issue open in the hope that someone else can provide a more elegant solution. Best of luck getting the script working.
Kind regards, Kai
The specific error that the command is reporting relates to decoding the base64 password, and that it is failing to decode the base64 string specified in the command.
base64.decodestring(opts.neopw64) <== generating exception for this user
It is odd in that I looked at the base64 string that was specified and it looked valid to me. Specifically:
--neopw64 c21xcw== <== looks okay at first blush, yet apparently causing problem for the user
Can we obtain information about the environment that the user is trying to use?
The windows environment I used in the past to build NEO4J graphs with SNOMED CT:
Windows 10 Anaconda Python, python version 2.7.13 py2neo library, version 3.1.1 DOS command line NEO4J 3.2.2 running on port 7474, an empty database prior to executing the python command
How to tell the version of py2neo?
From a python interpreter, execute the following python statements:
import py2neo
print(py2neo.__version__)
It may or may not be possible for the user to try this from a Linux machine, but I have had much fewer problems building the graph on Linux. Just fyi.
I will help get this to work on Windows for this user, but it tends to be generally less painful on Linux.
I was able to build a NEO4J graph on windows, using the configuration specified in the previous email. I include the command I used and the output it generates.
The following is the command and the output of the command (which varies slightly from what the user had used, (differences? ==> specifying the /Full/ subfolder in the --rf2 parameter and using c:\temp\Users\Marcelo instead of c:\Users\Marcelo, as I don't have a Marcel user on my machine).
COMMAND:
python snomed_g_graphdb_build_tools.py db_build --action create --rf2 C:/temp/Users/Marcelo/ReleaseSnomed/SnomedCT_RF2Release_INT_20150731/Full/ --release_type full --neopw64 c21xcw== --output_dir C:/temp/Users/marcelo/Documents/smqs
OUTPUT:
SNOMED_G bin directory [C:/temp/Users/Marcelo/github--rorydavidson/SNOMED-CT-Database-master/NEO4J/] sequence did not exist, primed JOB_START FIND_ROLENAMES FIND_ROLEGROUPS MAKE_CONCEPT_CSVS MAKE_DESCRIPTION_CSVS MAKE_ISA_REL_CSVS MAKE_DEFINING_REL_CSVS TEMPLATE_PROCESSING CYPHER_EXECUTION CHECK_RESULT JOB_END RESULT: SUCCESS
CHECKING THE RESULT A BIT:
THen investigating the graph:
match (a:ObjectConcept) return count(a);
==> 421,657
So, it shows 421,651 SNOMED CT codes, not all of which are active.
To look for active concepts (from 2015-07-31)
match (a:ObjectConcept) where a.active='1' return count(a)
==> 317,057
So, apparently finding 317,057 active SNOMED CT concepts in the international 2015-07-31 release.
Can find the same information this way:
match (a:ObjectConcept {active:'1'}) return count(a)
I wonder if the issue is the version of python that is being used.
This software has been created and tested with python 2.7. It has not yet been upgraded to support python 3.x. It appears that Anaconda python was used, which allows for the installation of multiple versions of python. I suggest retrying with python 2.7 if possible.
**Hello.
This work** if opts.mode=='build': neo4j = snomed_g_lib_neo4j.Neo4j_Access(base64.decodestring(opts.neopw64)) with: if opts.mode=='build': neo4j = snomed_g_lib_neo4j.Neo4j_Access("YOUR_PASSWORD")
But i have another problem.
C:\Users\Marcelo\Downloads\SNOMED-CT-Database-master\SNOMED-CT-Database-master\NEO4J>python snomed_g_graphdb_build_tools.py db_build --action create --rf2 C:/nuevo/ReleaseSnomed/ --release_type full --neopw64 c21xcw== --output_dir C:/Users/marcelo/Documents/smqs/
SNOMED_G bin directory [C:/Users/Marcelo/Downloads/SNOMED-CT-Database-master/SNOMED-CT-Database-master/NEO4J/]
sequence did not exist, primed
JOB_START
FIND_ROLENAMES
Traceback (most recent call last):
File "snomed_g_graphdb_build_tools.py", line 330, in
It appears to me that you are using python 3.x to execute this code, but this python code is written for python 2.7. It will be updated to work with python 3.x, but that has not yet happened.
In python 3.x, strings to not have a decode method, but they do in python 2.7.
Is it possible for you to try python 2.7?
If you are using Anaconda python, I believe you could do the following.
conda create -n py27 python=2.7 activate py27
you can switch back to your normal python version by
activate root
It would require installing the necessary libraries like py2neo and sqlitedict into your python 2.7.
Exactly this was the problem thank @jayped007 , but now Houston I have a new problem
SNOMED_G bin directory [C:/Users/Marcelo/Downloads/SNOMED-CT-Database-master/SNOMED-CT-Database-master/NEO4J/] sequence did not exist, primed JOB_START FIND_ROLENAMES FIND_ROLEGROUPS MAKE_CONCEPT_CSVS MAKE_DESCRIPTION_CSVS MAKE_ISA_REL_CSVS MAKE_DEFINING_REL_CSVS TEMPLATE_PROCESSING CYPHER_EXECUTION FAILED (steps: ['CYPHER_EXECUTION'])
Build.log
step:[CYPHER_EXECUTION],result:[FAILED (STATUS 83)],command:[python C:/Users/Marcelo/Downloads/SNOMED-CT-Database-master/SNOMED-CT-Database-master/NEO4J//snomed_g_neo4j_tools.py run_cypher build.cypher --verbose --neopw64 c21xcw==],status/expected:83/0,duration:0:00:01.665000,output:[],error:[],cmd_start:[2017-07-28 09:16:47.357000],cmd_end:[2017-07-28 09:16:49.022000]
What this means, is that the procedure has processed the SNOMED CT RF2 file, and created a CYPHER script (called build.cypher) to load nodes and edges to represent the SNOMED CT information in NEO4J. It has also built a significant number of CSV files that the build.cypher script depends on (in the directory specified by --output_dir).
The processing of the CYPHER code is what is apparently failing at the moment.
The python code assumes that NEO4J is running on the same machine, on port 7474. It uses the py2neo library to communicate with the NEO4J rest api (at URL localhost:7474). It also assumes that the NEO4J database is basically empty, or at least does not contain any of the types of nodes and edges that will be created by it (ObjectConcept nodes, Description nodes, RoleGroup nodes, ISA relationships, etc).
Is this the case? Do you have NEO4J running on that machine on port 7474. Can you tell me the NEO4J version?
The build.log is meant to have error information, when errors occur. Could you possibly post the whole build.log or examine it for further error information?
I don't know if this is the case for you or not, but I note that I have had issues when any of the directories involved contain spaces in their name. Like 'Program Files' or something like that, For example, trying to use C:\My Files, would probably cause issues (versus C:\MyFiles). I would suggest using directories that don't contain spaces when trying to use this software.
Hello @jayped007 . I dont have spaces in the names: C:/Users/Marcelo/Downloads/SNOMED-CT-Database-master/SNOMED-CT-Database-master/NEO4J/ CSV: C:\Users\Marcelo\Documents\smqs
Something to consider:
NEO4J, I believe in version 3, creating a configuration option in neo4j.conf that relates to where the LOAD CSV command can load files from.
This is what you see by default in the configuration
dbms.directories.import=import
This disallows things like importing CSV files from C:/Users/Marcelo
I am guessing that perhaps this is the issue you are running into. And at least temporarily, I would comment out that configuration line ... and retry executing the procedure (which is trying to use LOAD CSV commands to load the CSV files it created).
To comment out the configuration item:
Then restart NEO4J and retry.
So, if the issue is that the LOAD CSV statement is failing because it cant find the CSV files, then this is very likely the reason.
Here is how you get to the configuration files on WIndows
Select the "Options" button at the bottom of the form that is displayed when you click on NEO4J (the one that allows you to select a NEO4J database).
There is a set of 3 files, in NEO4J 3, that you can configure ... the first one is "neo4j.conf", and you click on the "Edit" button to bring it up in a text editor. That will allow you to modify the
dbms.directories.import=import
Item, changing it to be commented out.
I commented out this but the problem continues, maybe this is the problem?
n4jpw =base64.decodestring(opts.neopw64) graph_db = py2neo.Graph(password=n4jpw)
Hi Marcelo,
I believe I can help you get you across the finish line on this.
Take a look at the file "build.cypher" file in a text editor; that file was been created the the software you have already run (along with many CSV files).
What you will find in there is a significant number of NEO4J CYPHER statements.
The job of these statements is to create indexes and constraints and then load the SNOMED CT content which has been placed in CSV files into nodes and edges into the NEO4J.
You could run these, one by one, in your NEO4J browser.
The initial ones which create indexes and constraints should execute with no problem at all.
For example:
CREATE CONSTRAINT ON (c:ObjectConcept) ASSERT c.id IS UNIQUE; CREATE CONSTRAINT ON (c:ObjectConcept) ASSERT c.sctid IS UNIQUE;
I suspect strongly that the LOAD CSV CYPHER commands are the ones that are failing.
For example, when I tried to recreate your situation, I had the following LOAD CSV command as the first one in my build.cypher:
USING PERIODIC COMMIT 200 LOAD csv with headers from "file:///C:/temp/Users/marcelo/Documents/smqs/concept_new.csv" as line with line CREATE (:ObjectConcept { nodetype: 'concept', id: line.id, sctid: line.id, active: line.active, effectiveTime: line.effectiveTime, moduleId: line.moduleId, definitionStatusId: line.definitionStatusId, FSN: line.FSN, history: line.history} );
This command worked for me.
If you find the corresponding first LOAD CSV command in your build.cypher, and try to execute it in your NEO4J browser, presumably at:
localhost:7474
It should presumably fail, and the error that it generates will be the same error that is occuring when the software is trying to perform this same operation now.
If you could let me know what error you are seeing, then I think I can help you move forward.
Thanks!
Jay Pedersen
HELLO @jayped007 , I ran one by one th scripts into Neo4j and I found this problem:
There is not enough memory to perform the current task. Please try increasing 'dbms.memory.heap.max_size' in the neo4j configuration (normally in 'conf/neo4j.conf' or, if you you are using Neo4j Desktop, found through the user interface) or if you are running an embedded installation increase the heap by using '-Xmx' command line flag, and then restart the database.
What this tells me is that your NEO4J server does not have enough configured memory to perform this operation. The LOAD CSV
commands from build.cypher are failing because the NEO4J server itself is failing when trying to execute them due to memory issues. So, now we are moving into the arena of Java issues. The following notes give some direction on trying to fix the Java memory issues.
NOTES
There is a NEO4J 2.x/3.xconfiguration file on Windows, known as neo4j-community.vmoptions
, which modifies the Java virtual machine settings. Prominent among those settings is the -Xmx<size>
setting which controls the heap size for Java. The same dialog box for NEO4J on Windows that allowed modifying neo4j.conf, has a button for modifying the vmoptions configuration which I believe is labeled "Java VM Tuning".
My neo4j-community.vmoptions
file currently only has comments
# Enter one VM parameter per line, note that some parameters can only be set once.
# For example, to adjust the maximum memory usage to 512 MB, uncomment the following line
# -Xmx512m
In your case, you may want to try changing the line from
# -Xmx512m
To
-Xmx2G
That is, if you have 2 GB of memory memory available for NEO4J.
Make a similar change, restart NEO4J, and retry -- see if that allows you to move forward.
In the past, on a machine with 16GB of memory, I used the following configuration:
-Xmx5G
-Xms3G
-Xss2G
Articles that might be of use:
https://stackoverflow.com/questions/43078285/error-importing-my-csv-data-to-neo4j-java-heap-space
http://neo4j.com/docs/operations-manual/current/performance/#heap-sizing
Finalemten I tried with the server version and everything worked fine, I'll try the Spanish version.
Thank @kaicode , @rorydavidson , @wcampbel , @jayped007 , @aflinton .
I just love a happy ending. 👍 Let us know how the Spanish version loading works.
I'm having problems with the spanish version, @mbonda , did you get it to work?
hola @adrianalonsoba si no me funciono, en breve voy a armar una con base en la versión en español, de donde eres Adrian ?
soy español, tu? pero no tengo problemas en hablar en inglés si quieres... parece que hay algún problema con el parseo en español de los ficheros... ¿Así que no conseguiste que funcione? yo tengo particular interés en hacerlo en neo4j.
este es mi mail mbondarenco@gmail.com, estoy empezando un proyecto con snomed y noe4j principalmente como herramienta para visualizar, te tendré al tanto de mis avances. Soy de Uruguay.
adrian.alonsoba@gmail.com el mío, te lo agradezco, si consigo hacer avances también los compartiré contigo.
Si ustedes tienen problemas con Neo4j y como importar SNOMED CT, podemos ayudarles. Hay que tener versiones correctas de python y unos otros programas para usar nuestro algoritmo.
W. Scott Campbell, PhD, MBA Assistant Professor Director of Public Health Laboratory Informatics and Pathology Laboratory Informatics Department of Pathology/Microbiology University of Nebraska Medical Center 985900 Nebraska Medical Center Omaha NE 68198-5900 402-559-9593 (office) 402-350-7851 (mobile)
From: Adrián notifications@github.com Sent: Wednesday, November 8, 2017 5:46 AM To: IHTSDO/snomed-database-loader Cc: Campbell, Walter S; Mention Subject: Re: [IHTSDO/snomed-database-loader] Issue with loading Neo4J (#6)
adrian.alonsoba@gmail.commailto:adrian.alonsoba@gmail.com el mío, te lo agradezco, si consigo hacer avances también los compartiré contigo.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/IHTSDO/snomed-database-loader/issues/6#issuecomment-342794026, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AIggLvGLhBiW_XP5OLTA-8gVHQ3MMd69ks5s0ZSSgaJpZM4Oipld.
The information in this e-mail may be privileged and confidential, intended only for the use of the addressee(s) above. Any unauthorized use or disclosure of this information is prohibited. If you have received this e-mail by mistake, please delete it and immediately contact the sender.
Thanks for your answer @wcampbel, i have uploaded the internartional version with no errors, but when i try to add the spanish extension it raises several parsing errors... maybe due to spanish character conflicts.
Solved, simply by changing the file names to match with the international files, it seems that the spanish extension is uploaded successfully
Fantastic! Let me know if you have any questions or comments.
W. Scott Campbell, PhD, MBA Assistant Professor Director of Public Health Laboratory Informatics and Pathology Laboratory Informatics Department of Pathology/Microbiology University of Nebraska Medical Center 985900 Nebraska Medical Center Omaha NE 68198-5900 402-559-9593 (office) 402-350-7851 (mobile)
From: Adrián notifications@github.com Sent: Thursday, November 9, 2017 2:44 AM To: IHTSDO/snomed-database-loader Cc: Campbell, Walter S; Mention Subject: Re: [IHTSDO/snomed-database-loader] Issue with loading Neo4J (#6)
Solved, by simply change the file names to match with the international files, it seems that the spanish extension is uploaded successfully
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/IHTSDO/snomed-database-loader/issues/6#issuecomment-343085826, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AIggLgwqvkx_JUcmwgyOAMMHr9PkxEDyks5s0rt-gaJpZM4Oipld.
The information in this e-mail may be privileged and confidential, intended only for the use of the addressee(s) above. Any unauthorized use or disclosure of this information is prohibited. If you have received this e-mail by mistake, please delete it and immediately contact the sender.
I am currently exploring the relations model, @wcampbel thank you so much for sharing your fantastic work.
provided by another user and copied here I am working with SNOMED CT, and i have seen your code and I tried to load data into noa4j, but i had problems that maybe you will help me with this.
This is the problem.
run this
python snomed_g_graphdb_build_tools.py db_build --action create --rf2 C:/Users/Marcelo/Documents/ReleaseSnomed/SnomedCT_RF2Release_INT_20150731 --release_type full --neopw64 c21xcw== --output_dir C:/Users/marcelo/Documents/smqs
i got this
SNOMED_G bin directory [C:/Users/Marcelo/Downloads/SNOMED-CT-Database-master/SNOMED-CT-Database-master/NEO4J/] Traceback (most recent call last): File "C:\ProgramData\Anaconda3\lib\base64.py", line 517, in _input_type_check m = memoryview(s) TypeError: memoryview: a bytes-like object is required, not 'str'
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "snomed_g_graphdb_build_tools.py", line 328, in
parse_and_interpret(sys.argv[1:]) # causes sub-command processing to occur as well
File "snomed_g_graphdb_build_tools.py", line 325, in parse_and_interpret
command_interpreters[command_index]1 # call appropriate interpreter
File "snomed_g_graphdb_build_tools.py", line 198, in db_build
if opts.mode=='build': neo4j = snomed_g_lib_neo4j.Neo4j_Access(base64.decodestring(opts.neopw64))
File "C:\ProgramData\Anaconda3\lib\base64.py", line 559, in decodestring
return decodebytes(s)
File "C:\ProgramData\Anaconda3\lib\base64.py", line 551, in decodebytes
_input_type_check(s)
File "C:\ProgramData\Anaconda3\lib\base64.py", line 520, in _input_type_check
raise TypeError(msg) from err
TypeError: expected bytes-like object, not str