RTXteam / RTX-KG2

Build system for the RTX-KG2 biomedical knowledge graph, part of the ARAX reasoning system (https://github.com/RTXTeam/RTX)
MIT License
37 stars 8 forks source link

implement long-term fix for neo4j-admin import buffer overflow issue (work around the 2019 hack) #45

Open saramsey opened 5 years ago

saramsey commented 5 years ago

Apparently the max size of an edge object is 4194304 characters, due to a buffer limit imposed somewhere in neo4j-admin. @BluePiggy49 has implemented a workaround in code, but we will eventually want to address the underlying buffer limit issue (if possible) and see if we can take measures upstream to prevent winding up with edges with huge TSV character sizes.

ecwood commented 5 years ago

In some of the edges, the size of the publications info field is 8 digits large. To combat this, I created the function "limit_publication_info_size" in json_to_tsv.py. This limits the number of fields in an edge with a publications info dictionary that is longer than 300000 characters to 6. (I now see that this should probably be changed to 3000000, but it currently works). With this fix, all of the edges can go into Neo4j. Neo4j suggests that the issue is an "unterminated quote", but I know from my testing that this is not the case. The edge is so large that it can't go into Neo4j. It appears that the edges with this problem are from SemMedDB.

ecwood commented 5 years ago

Potentially, we could add newlines rather than limiting the number of entries in the publications info dictionary. See: https://stackoverflow.com/questions/56511859/weird-input-data-no-newline-character-in-the-whole-buffer-4194304-not-supporte

ecwood commented 5 years ago

The original error message was: original error: Tried to read a field larger than buffer size 4194304. A common cause of this is that a field has an unterminated quote and so will try to seek until the next quote, which ever line it may be on. This should not happen if multi-line fields are disabled, given that the fields contains no new-line characters. This field started at /var/lib/neo4j/import/edges.csv:31318326

amykglen commented 4 years ago

@saramsey - hmm, I just got a similar error when trying to push the latest KG2 build to neo4j:

original error: Tried to read a field larger than buffer size 4194304. A common cause of this is that a field has an unterminated quote and so will try to seek until the next quote, which ever line it may be on. This should not happen if multi-line fields are disabled, given that the fields contains no new-line characters. This field started at /home/ubuntu/kg2-build/TSV/nodes.tsv:3251863

but it looks like in this case, the problem field is the synonym field on a node...

full output of running ./tsv-to-neo4j on kg2endpoint2:

ubuntu@ip-172-31-38-53:~/kg2-code$ bash -x ./tsv-to-neo4j.sh

Available resources: Total machine memory: 62.30 GB Free machine memory: 29.86 GB Max heap memory : 13.85 GB Processors: 8 Configured max memory: 43.61 GB High-IO: false

Import starting 2019-11-07 17:07:22.300+0000 Estimated number of nodes: 10.55 M Estimated number of node properties: 112.74 M Estimated number of relationships: 118.04 M Estimated number of relationship properties: 1.53 G Estimated disk space usage: 80.30 GB Estimated required memory usage: 1.13 GB

InteractiveReporterInteractions command list (end with ENTER): c: Print more detailed information about current stage i: Print more detailed information

(1/4) Node import 2019-11-07 17:07:22.330+0000 Estimated number of nodes: 10.55 M Estimated disk space usage: 7.31 GB Estimated required memory usage: 1.13 GB .......... .......... .......... .......... .......... 5% ∆8s 641ms .......... .......... .......... .......... .......... 10% ∆7s 8ms .......... .......... .......... .......... .......... 15% ∆544ms .......... .......... .......... .......... .......... 20% ∆1ms .......... .......... .......... .......... .......... 25% ∆1ms .......... .......... .......... .......... .......... 30% ∆1ms .......... .......... .......... .......... .......... 35% ∆1ms .......... .......... .......... .......... .......... 40% ∆0ms .......... .......... .......... .......... .......... 45% ∆1ms .......... .......... .......... .......... .......... 50% ∆1ms .......... .......... .......... .......... .......... 55% ∆1ms .......... .......... .......... .......... .......... 60% ∆0ms .......... .......... .......... .......... .......... 65% ∆1ms .......... .......... .......... .......... .......... 70% ∆0ms .......... .......... .......... .......... .......... 75% ∆1ms .......... .......... .......... .......... .......... 80% ∆0ms .......... .......... .......... .......... .......... 85% ∆1ms .......... .......... .......... .......... .......... 90% ∆0ms .......... .......... .......... .......... .......... 95% ∆1ms .......... .......... .......... .......... .......... 100% ∆0ms

IMPORT FAILED in 16s 836ms. Data statistics is not available. Peak memory usage: 0.00 B Error in input data Caused by:ERROR in input data source: BufferedCharSeeker[source:/home/ubuntu/kg2-build/TSV/nodes.tsv, position:985661441, line:3251863] in field: synonym:string:13 for header: [category:string, :LABEL, creation_date:string, deprecated:string, description:string, full_name:string, id:ID, iri:string, name:string, provided_by:string, publications:string, replaced_by:string, synonym:string, update_date:string, category_label:string] raw field value: ?? original error: Tried to read a field larger than buffer size 4194304. A common cause of this is that a field has an unterminated quote and so will try to seek until the next quote, which ever line it may be on. This should not happen if multi-line fields are disabled, given that the fields contains no new-line characters. This field started at /home/ubuntu/kg2-build/TSV/nodes.tsv:3251863

WARNING Import failed. The store files in /var/lib/neo4j/data/databases/graph.db are left as they are, although they are likely in an unusable state. Starting a database on these store files will likely fail or observe inconsistent records so start at your own risk or delete the store manually unexpected error: ERROR in input data source: BufferedCharSeeker[source:/home/ubuntu/kg2-build/TSV/nodes.tsv, position:985661441, line:3251863] in field: synonym:string:13 for header: [category:string, :LABEL, creation_date:string, deprecated:string, description:string, full_name:string, id:ID, iri:string, name:string, provided_by:string, publications:string, replaced_by:string, synonym:string, update_date:string, category_label:string] raw field value: ?? original error: Tried to read a field larger than buffer size 4194304. A common cause of this is that a field has an unterminated quote and so will try to seek until the next quote, which ever line it may be on. This should not happen if multi-line fields are disabled, given that the fields contains no new-line characters. This field started at /home/ubuntu/kg2-build/TSV/nodes.tsv:3251863

amykglen commented 4 years ago

this is the TSV for what seems to be the problem node (the synonyms field is way too long, so I've truncated it here - the full version is in this file: problem_node_synonym_field.txt)... it looks like the node ID is 'MTH:NOCODE', which doesn't seem great... (perhaps this node appeared as a result of our adding back the umls file umls-mth.ttl?)

http://w3id.org/biolink/vocab/Protein protein False UMLS Semantic Type: TUI:T171; UMLS Semantic Type: TUI:T167; UMLS Semantic Type: TUI:T169; UMLS Semantic Type: TUI:T168; UMLS Semantic Type: TUI:T170; UMLS Semantic Type: TUI:T196; UMLS Semantic Type: TUI:T195; UMLS Semantic Type: TUI:T194; UMLS Semantic Type: TUI:T197; UMLS Semantic Type: TUI:T185; UMLS Semantic Type: TUI:T184; UMLS Semantic Type: TUI:T192; UMLS Semantic Type: TUI:T191; UMLS Semantic Type: TUI:T190; UMLS Semantic Type: TUI:T130; UMLS Semantic Type: TUI:T131; UMLS Semantic Type: TUI:T123; UMLS Semantic Type: TUI:T122; UMLS Semantic Type: TUI:T121; UMLS Semantic Type: TUI:T120; UMLS Semantic Type: TUI:T127; UMLS Semantic Type: TUI:T126; UMLS Semantic Type: TUI:T125; UMLS Semantic Type: TUI:T129; UMLS Semantic Type: TUI:T116; UMLS Semantic Type: TUI:T114; UMLS Semantic Type: TUI:T101; UMLS Semantic Type: TUI:T100; UMLS Semantic Type: TUI:T104; UMLS Semantic Type: TUI:T103; UMLS Semantic Type: TUI:T102; UMLS Semantic Type: TUI:T109; UMLS Semantic Type: TUI:T097; UMLS Semantic Type: TUI:T096; UMLS Semantic Type: TUI:T095; UMLS Semantic Type: TUI:T094; UMLS Semantic Type: TUI:T099; UMLS Semantic Type: TUI:T098; UMLS Semantic Type: TUI:T086; UMLS Semantic Type: TUI:T085; UMLS Semantic Type: TUI:T083; UMLS Semantic Type: TUI:T089; UMLS Semantic Type: TUI:T087; UMLS Semantic Type: TUI:T093; UMLS Semantic Type: TUI:T092; UMLS Semantic Type: TUI:T091; UMLS Semantic Type: TUI:T090; UMLS Semantic Type: TUI:T053; UMLS Semantic Type: TUI:T052; UMLS Semantic Type: TUI:T051; UMLS Semantic Type: TUI:T050; UMLS Semantic Type: TUI:T057; UMLS Semantic Type: TUI:T056; UMLS Semantic Type: TUI:T055; UMLS Semantic Type: TUI:T054; UMLS Semantic Type: TUI:T059; UMLS Semantic Type: TUI:T058; UMLS Semantic Type: TUI:T060; UMLS Semantic Type: TUI:T042; UMLS Semantic Type: TUI:T041; UMLS Semantic Type: TUI:T040; UMLS Semantic Type: TUI:T046; UMLS Semantic Type: TUI:T045; UMLS Semantic Type: TUI:T044; UMLS Semantic Type: TUI:T043; UMLS Semantic Type: TUI:T049; UMLS Semantic Type: TUI:T048; UMLS Semantic Type: TUI:T047; UMLS Semantic Type: TUI:T075; UMLS Semantic Type: TUI:T074; UMLS Semantic Type: TUI:T073; UMLS Semantic Type: TUI:T072; UMLS Semantic Type: TUI:T079; UMLS Semantic Type: TUI:T078; UMLS Semantic Type: TUI:T077; UMLS Semantic Type: TUI:T082; UMLS Semantic Type: TUI:T081; UMLS Semantic Type: TUI:T080; UMLS Semantic Type: TUI:T064; UMLS Semantic Type: TUI:T063; UMLS Semantic Type: TUI:T062; UMLS Semantic Type: TUI:T061; UMLS Semantic Type: TUI:T068; UMLS Semantic Type: TUI:T067; UMLS Semantic Type: TUI:T066; UMLS Semantic Type: TUI:T065; UMLS Semantic Type: TUI:T069; UMLS Semantic Type: TUI:T071; UMLS Semantic Type: TUI:T070; UMLS Semantic Type: TUI:T013; UMLS Semantic Type: TUI:T012; UMLS Semantic Type: TUI:T011; UMLS Semantic Type: TUI:T010; UMLS Semantic Type: TUI:T017; UMLS Semantic Type: TUI:T016; UMLS Semantic Type: TUI:T015; UMLS Semantic Type: TUI:T014; UMLS Semantic Type: TUI:T019; UMLS Semantic Type: TUI:T018; UMLS Semantic Type: TUI:T002; UMLS Semantic Type: TUI:T001; UMLS Semantic Type: TUI:T005; UMLS Semantic Type: TUI:T004; UMLS Semantic Type: TUI:T008; UMLS Semantic Type: TUI:T007; UMLS Semantic Type: TUI:T031; UMLS Semantic Type: TUI:T030; UMLS Semantic Type: TUI:T034; UMLS Semantic Type: TUI:T033; UMLS Semantic Type: TUI:T032; UMLS Semantic Type: TUI:T039; UMLS Semantic Type: TUI:T038; UMLS Semantic Type: TUI:T037; UMLS Semantic Type: TUI:T020; UMLS Semantic Type: TUI:T024; UMLS Semantic Type: TUI:T023; UMLS Semantic Type: TUI:T022; UMLS Semantic Type: TUI:T021; UMLS Semantic Type: TUI:T028; UMLS Semantic Type: TUI:T026; UMLS Semantic Type: TUI:T025; UMLS Semantic Type: TUI:T029; UMLS Semantic Type: TUI:T200; UMLS Semantic Type: TUI:T204; UMLS Semantic Type: TUI:T203; UMLS Semantic Type: TUI:T201 MTH:NOCODE https://identifiers.org/umls/MTH/NOCODE Delusional disorder https://identifiers.org/umls/MTH [] "['Pathological fracture in neoplastic disease, unspecified shoulder, subsequent encounter for fracture with nonunion', 'R788 Free Acid (indacaterol maleate)', 'MIR4288 gene', 'Prosthetic replacement of temporomandibular joint (procedure)', 'Abrasion, right lesser toe(s), subsequent encounter', 'Tylototriton <amphibian>', 'ropinirole 6 MG Extended Release Oral Tablet [Requip]', 'RPS12P3 gene', 'Scott County, AR', 'ultrasound trans-vaginal fetal cardiac activity (___ bpm)', 'Isoniazid 1 ug/mL [Susceptibility] by Method for Slow-growing mycobacteria', 'Laceration of other blood vessels of thorax, unspecified side, sequela', 'Geroquinol', 'Medial surface of second toe', 'Anoscopy, high resolution (hra) (with magnification and chemical agent enhancement); with biopsy(ies)', 'Carrier of disorder', 'Marshall Eskimos', 'Nicotine 0.292 MG/HR Transdermal System [Habitrol]', 'LINC02003 gene', 'Resolution Property', 'ZNF33B gene', 'SDHAP2 gene', 'NCKIPSD wt Allele', 'Rheumatoid bursitis, right hip', 'MIR6752 gene', 'Puncture wound without foreign body of left middle finger without damage to nail, sequela', 'Scott County, TN', 'Trophamine 6 %', 'Specimen Source Codes - Pacemaker', 'murmur left lower sternal border systolic grade V', 'Rheumatoid myopathy with rheumatoid arthritis of unspecified ankle and foot', 'NACAP5 gene', 'Displaced comminuted fracture of shaft of ulna, unspecified arm, subsequent encounter for open fracture type I or II with routine healing', 'Peripheral Myelin Protein 22', 'Epidural hemorrhage with loss of consciousness greater than 24 hours without return to pre-existing conscious level with patient surviving, initial encounter', 'Bovine adenovirus 5 Antigen', 'Neck Disability Index', 'Nondisplaced comminuted fracture of shaft of humerus, right arm, initial encounter for open fracture', 'Documentation that order was given to discontinue prophylactic antibiotics within 24 hours of surgical end time, non-cardiac procedure (PERI 2)', 'Ritonavir', 'problem related to lifestyle (history)', 'titanium dioxide', 'aloe vera homeopathic preparation', 'Trembling', 'Pedestrian on skateboard injured in collision with two- or three-wheeled motor vehicle in traffic accident, initial encounter', 'Uterine Polyp', 'Nondisplaced fracture of coracoid process, unspecified shoulder, subsequent encounter for fracture with nonunion', 'Nondisplaced fracture of neck of right talus, subsequent encounter for fracture with delayed healing', 'RN7SL352P gene', 'TRA-AGC18-1 gene', 'Smudge Cell Count', 'RHOA wt Allele', 'Twenty nail dystrophy', 'Phylum Fibrobacteres', 'Multiple endocrine neoplasia Type 2', 'GAGE12C gene', 'Stenosis Morphology', 'Chief complaint:Finding:Point in time:^Patient:Narrative:Reported', 'Gallbladder Lymphoma', 'Unspecified atherosclerosis of autologous vein bypass graft(s) of the extremities, unspecified extremity', 'Alpha 1 antitrypsin:Mass Concentration:Point in time:Serum/Plasma:Quantitative', 'TAGLN gene', 'Renal Colic, CTCAE', 'Underdosing of other nonsteroidal anti-inflammatory drugs [NSAID], subsequent encounter', 'Victoria Genus', 'Laceration with foreign body of unspecified shoulder, initial encounter', 'Macrocystis pyrifera extract', 'Entire superficial flexor tendon of finger', 'FCGR2A gene', 'Bovine ehrlichiosis', 'HEMANGIOMA CONGENITAL', 'Structure of fetal chondrification center', 'CEROID LIPOFUSCINOSIS, NEURONAL, 13', 'Total number of staff completing the job satisfaction survey:Num:Pt:{Nursing unit}:Qn', 'RN7SL431P gene', 'DAAM1 gene', 'Congenital atransferrinemia', 'Calcarea renalis, Homeopathic preparations', 'Ophiocordyceps sinensis', 'Structure of sphenoid angle of parietal bone', 'Entire facial vein', 'PER1 wt Allele', 'Nondisplaced fracture of epiphysis (separation) (upper) of right femur, initial encounter for open fracture type I or II', 'Physostigmine sulfate', 'Framboesiform syphilid', 'Displaced avulsion fracture of right ischium, subsequent encounter for fracture with delayed healing', 'HEMOGLOBIN DIEPPE PHENOTYPE', 'WAPAL protein, human', 'Metencephalon', 'Potassium gluconate 2.13 MEQ Oral Tablet', 'Forced landing of other private fixed-wing aircraft injuring occupant, subsequent encounter', 'HUMANIN gene', 'Percutaneous needle biopsy liver', 'KNL1 gene', 'Crushing injury of unspecified foot, subsequent encounter', 'BEHAVIOR HYPERACTIVE', 'disaster - ActInformationManagementReason', 'MXL brand of morphine sulfate', 'DEVELOPMENTAL DYSPLASIA OF THE HIP 2', 'Hypertriglyceridemia', 'Laceration of muscle, fascia and tendon of right hip, sequela', 'Laceration without foreign body of right ear, initial encounter', 'Other in-line roller-skate accident, subsequent encounter', 'aberrant right subclavian artery', 'Salter-Harris Type I physeal fracture of unspecified metatarsal, sequela', 'C4orf50 gene', '1 ML heparin sodium, porcine 5000 UNT/ML Prefilled Syringe', 'Inadequate social skills, not elsewhere classified in ICD10CM', 'SP4 gene', 'Primary carcinoma ex pleomorphic adenoma of oropharynx', 'Sprain of metatarsophalangeal joint of unspecified lesser toe(s), sequela', 'IGLVI-68 gene', 'Structure of cisterna chyli', 'Entire coronoid process of mandible', 'Identification information:-:Point in time:^Patient:-', 'ARL2BPP8 gene', 'Encounter due to contact with and (suspected) exposure to polycyclic aromatic hydrocarbons', 'BMPER gene', 'MIR517C gene', 'Unspecified injury of extensor muscle, fascia and tendon of left middle finger at wrist and hand level, sequela', 'G-T mismatch-binding protein', 'Ventilator - respiratory equipment', 'Coding Specialist, Physician Office Based - Health Information', 'Strain of muscle, fascia and tendon of other parts of biceps, right arm, sequela', 'Common wart', 'Fecal occult blood: positive', 'Origanum (plant)', 'Pyronine B stain method', 'PhenX - arthritis - osteoarthritis protocol', 'Complement component C8', 'ZNF485 gene', 'Other yatapoxvirus infections', 'H3F3AP1 gene', 'Diphtheria Toxoid/Tetanus Toxoid/Inactivated Pertussis Vaccine', 'Cucumis melo spp Antibody.immunoglobulin G.RAST class:Arbitrary Concentration:Point in time:Serum:Ordinal', 'EUCALYPTUS OIL/MENTHOL 15%/LANOLIN OINT', 'Wedge compression fracture of T7-T8 vertebra, initial encounter for open fracture', 'Esophageal Stenosis', 'Adolescent Obesity', 'Pedestrian on other rolling-type pedestrian conveyance colliding with stationary object, initial encounter', 'COX5BP4 gene', 'Progestins', 'Stress fracture, left tibia, subsequent encounter for fracture with nonunion', 'Poisoning by other estrogens and progestogens, undetermined, sequela', 'Astragalus excapus, huang qi, Homeopathic preparation', 'Displaced fracture of medial malleolus of right tibia, subsequent encounter for open fracture type I or II with malunion', 'KIAA1324L gene', 'Contusion of unspecified lesser toe(s) without damage to nail, subsequent encounter', 'Political Revolutions', 'Xenopus <genus>', 'Athysanus <angiosperm>', 'Dislocation of internal left hip prosthesis, sequela', 'Volume (publication)', 'Underdosing of other antiprotozoal drugs, sequela', 'Intrapelvic', 'Hamilton County, IA', 'Maternal care for (suspected) damage to fetus from alcohol, fetus 4', 'Epogen 10000 UNT/ML includes Injectable Solution & Injection', 'palmitoyl CoA ligase', 'VPREB3 gene', 'Trapa natans', 'Dura Mater', 'Drug or chemical induced diabetes mellitus with proliferative diabetic retinopathy with traction retinal detachment not involving the macula, right eye', ""5-methyl-2'-fluoroarauracil"", 'Vanilla planifolia specific immunoglobulin E', 'Displaced spiral fracture of shaft of humerus, left arm, initial encounter for closed fracture', 'Displaced fracture of anterior wall of unspecified acetabulum, subsequent encounter for fracture with routine healing', 'Partially Hearing Impaired', 'Keratin-14', 'Paecilomyces variotii', 'Agenesis of nasal cartilages', 'RN7SL625P gene', 'Methylprednisolone 62.5 MG/ML Injectable Solution [Solu-Medrol]', 'WASF3 gene', 'History of recent death of grandmother', 'Infective myositis, unspecified upper arm', 'TOMM7 gene', 'KCNMA1-AS3 gene', 'Does ride a bicycle', 'Entire first cuneiform articular facet of second metatarsal bone', 'Deprecated Streptococcus pneumoniae group B Antibody:Arbitrary Concentration:Point in time:Serum:Ordinal:ENZYME IMMUNOASSAY', 'local anesthetic throat preparations', 'methanol oxidase', 'Structure of lymph node of greater curvature of stomach', 'Microbispora amethystogenes', 'Initial observation care, per day, for the evaluation and management of a patient, which requires these 3 key components: A comprehensive history; A comprehensive examination; and Medical decision making of high complexity. Counseling and/or coordination of care with other providers or agencies are provided consistent with the nature of the problem(s) and the patient\'s and/or family\'s needs. Usually, the problem(s) requiring admission to ""observation status"" are of high severity.', 'insufficient social insurance and welfare support (history)', 'Actual Effective Infant Feeding Behavior', 'Other specified injury of unspecified muscles, fascia and tendons at thigh level, right thigh, initial encounter', '10 ML Propofol 10 MG/ML Injection [Diprivan]', 'SYT5 gene', 'Assault by other bodily force, initial encounter', 'Other specified joint disorders, unspecified shoulder', 'Balloon dilation of blood vessel spasm in head, accessed through the skin', 'TMC7 gene', 'Corrosion of second degree of unspecified hand, unspecified site, initial encounter', 'IL25 gene', 'nucleus X', 'PhenX - tobacco - 30D quantity and frequency - adult protocol:-:Point in time:^Patient:-:PhenX', 'RAB37 gene', 'Brain stem disorder', 'Other specified disorders of bone density and structure, left ankle and foot', 'acacia decurrens whole extract', 'PhenX - sugar protocol', 'Magnesia sulphurica, epsom salt, Homeopathic preparation', 'Medial canthal ligament', 'RAMP1 gene', 'Structure of capsule of joint of pisiform bone', ""Writer's cramp neurosis"", 'Formin-Binding Protein 1', 'N-formylmethionine aminopeptidase', 'Toxic effect of glycols, assault, sequela', 'Papilledema', 'Social Influences', 'Open wound of trachea, uncomplicated', 'Chitina Indians', 'Lincoln County, KY', 'Rh Immune Globulin Consent Type', 'Deprecated House dust IgE Ab [Units/volume] in Serum', 'Cranial Suture Separation', 'MAGI2 protein, human', 'tissue transglutaminase activity', 'Ruscus', 'Atherosclerosis of nonbiological bypass graft(s) of other extremity with ulceration', 'Buprenorphine 0.02 MG/HR Transdermal System', 'Systemic venous structure', 'Cytochrome P450 17A1 Inhibitor [EPC]', 'hemoglobin G-Taipei', 'Maxillary right second premolar prosthesis', 'Repair of perineal hernia using surgical mesh', 'Carbonates', 'LINC02395 gene', 'DAB1 gene', 'Nondisplaced fracture of medial malleolus of right tibia, subsequent encounter for open fracture type IIIA, IIIB, or IIIC with routine healing', 'Feeling Anesthesia', 'Displaced fracture of neck of first metacarpal bone, left hand, subsequent encounter for fracture with malunion', 'Complete traumatic amputation at elbow level, left arm, subsequent encounter', 'Preterm labor with preterm delivery, unspecified trimester, fetus 1', 'Minor laceration of right carotid artery, initial encounter', 'PhenX measure - verbal memory:-:Point in time:^Patient:-:PhenX', 'Oocyte Retrieval', 'Raw tongue', 'CMS - cardiovascular exam panel:-:Point in time:^Patient:Narrative', 'Organum Vasculosum Laminae Terminalis', 'Minor laceration of unspecified external jugular vein, initial encounter', 'third-degree perineal laceration during delivery involving anal sphincter', 'secretase', 'Acetaminophen 325mg, Diphenhydramine Hydrochloride 25mg, Phenylephrine Hydrochloride 5mg Oral tablet, Acetaminophen 325mg, Guaifenesin 200mg, Phenylephrine Hydrochloride 5mg Oral tablet', 'MIR4418 gene', 'Laceration of unspecified blood vessel at wrist and hand level of unspecified arm, subsequent encounter', 'Other dislocation of unspecified foot, sequela', 'Off-Label Use (medical device)', 'Cytarabine liposome', 'Pharynx Pulmonary Fistula Adverse Event', 'Premature Ovarian Failure 1', 'Displaced fracture of coracoid process, left shoulder, initial encounter for closed fracture', 'FRIEDREICH ATAXIA WITH RETAINED REFLEXES', 'Complaining of ""tired all the time""', 'Breakdown (mechanical) of indwelling urethral catheter, subsequent encounter', 'Acetaminophen 325 MG / Chlorpheniramine Maleate 2 MG / Dextromethorphan Hydrobromide 15 MG / Pseudoephedrine Hydrochloride 30 MG Oral Tablet', 'Herpes simplex virus 1 Antigen', 'TINAGL1 gene', 'Serum amylase pancreatic result', 'Superficial frostbite of unspecified finger(s), sequela', 'Non-pressure chronic ulcer of other part of unspecified foot with necrosis of muscle', 'hemoglobin Hafnia', 'What subject filter - Status', 'Lymphotactin Measurement', 'Minocycline 112.5 MG Extended Release Oral Capsule', 'Chad', 'Complete traumatic amputation of unspecified great toe, initial encounter', 'Other injuries of lung, bilateral, subsequent encounter', 'Finding of palmar crease', 'Genetic analysis discrete result panel', 'General appearance of specimen - finding', 'Dropouts', 'PhenX - disinhibiting behaviors - impulsivity - child protocol', 'Drowning and submersion due to fall off passenger ship, subsequent encounter', 'Nondisplaced transverse fracture of shaft of left femur, initial encounter for open fracture type I or II', 'HEDIS 2010 Tests used in early prenatal care - ABO & Rh (PPC-C):-:Point in time:^Patient:-', 'Refractive amblyopia, unspecified eye', 'Broken internal joint prosthesis, other site, subsequent encounter', 'multicatalytic endopeptidase complex activity', 'Structure of round subcutaneous fat tissue of forehead (body structure)', 'Displaced midcervical fracture of right femur, initial encounter for open fracture type I or II', 'CV2 (body structure)', 'ACAD8 gene', 'Natural Killer Cell Activity Measurement', 'Fatigue fracture of vertebra, site unspecified, subsequent encounter for fracture with delayed healing', 'SERPINA1 wt Allele', 'Gastric contents in pharynx causing asphyxiation, sequela', 'Sudan III', 'Lamina of fifth thoracic vertebra', 'Removal of lens material; extracapsular (other than 66840, 66850, 66852)', 'Anemia due to blood loss', 'Stage III Anal Canal Cancer AJCC v6 and v7', 'MLL5 protein, human', 'Burning feeling vagina', 'Urokinase 50000 UNT/ML Injectable Solution [Abbokinase]', 'Adverse effect of oxytocic drugs, initial encounter', 'Glutathione Disulfide', 'Erythromycin 250 MG Oral Tablet', 'SLCO4A1 gene', 'Entire mesogastrium', 'Powellia <moss>', 'MIR515-2 gene', 'Homovanillate & Creatinine:Impression/interpretation of study:Point in time:Urine:Narrative', 'Nucleotide Excision Repair', 'Auditory area, function (observable entity)', 'PPIAP63 gene', 'Injury of left internal carotid artery, intracranial portion, not elsewhere classified with loss of consciousness of any duration with death due to brain injury prior to regaining consciousness, subsequent encounter', 'Buprenorphine 0.035 MG/HR Transdermal System', 'Diagnostic Service Section ID - Mycology', 'Displaced fracture of proximal phalanx of unspecified thumb, sequela', 'LINC00431 gene', 'APAF1 wt Allele', 'Crushing injury of unspecified ankle, sequela', 'MRI Adult - Consent Type', 'Foot joint synovial fluid (specimen)', 'unidirectional conjugation', 'Other fracture of fourth metacarpal bone, left hand, subsequent encounter for fracture with nonunion', 'MSBP1 gene', 'Ulnohumeral (joint) sprain of right elbow, subsequent encounter', 'Age related macular degeneration', 'Corrosion of third degree of multiple sites of unspecified lower limb, except ankle and foot, initial encounter', 'Displaced midcervical fracture of right femur, subsequent encounter for open fracture type IIIA, IIIB, or IIIC with delayed healing', 'RPS26P29 gene', 'Enterohemorrhagic Escherichia coli', 'Sheehan Disability Scale Questionnaire', 'Esophagogastric fundoplasty', 'Traumatic blister of vagina', 'Actual Lack Of Knowledge Of Treatment Regime', 'Carpal Tunnel Syndrome', 'NMTRQ-TTG8-1 gene', 'FCGR2C wt Allele', 'Sullivan County, NY', 'ORAI1 gene', 'Colestipol Hydrochloride 5000 MG Oral Granules', 'Mecysteine hydrochloride', 'Unspecified injury of blood vessel of left middle finger, initial encounter', 'Person boarding or alighting a pick-up truck or van injured in collision with car, pick-up truck or van, initial encounter', 'Government Publications as Topic', 'Minichromosome Maintenance Complex Component 4', 'Displaced fracture of base of fourth metacarpal bone, left hand, subsequent encounter for fracture with malunion', 'Mahonia aquifolium root bark extract', 'Other heelies accident, sequela', 'Squamous Hyperplasia', 'HEMOGLOBIN NOUAKCHOTT PHENOTYPE', 'Partial thyroid lobectomy, unilateral; with or without isthmusectomy', 'TMEM126B gene', 'Structure of inferior articular process of eighth thoracic vertebra', 'Date of last evaluation:Date:Pt:^Patient:Qn', 'Nephroblastomatosis, fetal ascites, macrosomia and Wilms tumor', 'Structure of anterior rectus capitis muscle', 'Consumer Health Vocabulary', 'ARC Stent Thrombosis Timing Very Late', 'Phenylephrine Hydrochloride 1 MG/ML Prefilled Syringe', 'Columba guinea', 'Urobactam', 'DDAVP', 'LCLAT1 gene', 'CXCL14 protein, human', 'Vestibular ganglion', 'Peptamen', 'total mastectomy careful hemostasis ensured', 'Malnutrition universal screening tool score (observable entity)', 'Enteroviral lymphonodular pharyngitis', 'Opioid use, unspecified with intoxication, uncomplicated', 'SLC46A3 gene', 'Unspecified fracture of head of left femur, subsequent encounter for closed fracture with routine healing', 'SIN hearing assessment list', 'ZNF785 gene', 'Licence Serial Number for Read code drug administration', 'HEMOGLOBIN MANHATTAN PHENOTYPE', 'Cosa (Invertebrate)', 'WDSUB1 gene', 'tea leaf extract', 'Culex quinquefasciatus', 'GAPDHP68 gene', 'DPT gene', 'Displaced fracture of greater trochanter of left femur, subsequent encounter for open fracture type I or II with delayed healing', 'Observable entity', 'Hemoglobin S/Hemoglobin.total:Mass Fraction:Point in time:Whole blood:Quantitative', 'Toxic effect of petroleum products, accidental (unintentional), sequela', 'interleukin-1 epsilon', 'Mouse Pancreatic Disorder', 'CD95 Antigens', 'C7orf50 gene', 'Injection, belimumab, 10 mg administered', 'fibrinogen alpha chain location', 'Sublingual Route of Drug Administration', 'FIBP protein, human', 'Other fracture of upper and lower end of unspecified fibula, subsequent encounter for open fracture type I or II with routine healing', 'GRSF1 gene', 'Petroica traversi', 'Discharge note:Finding:Point in time:{Setting}:Document:{Provider}', 'SARM1 gene', 'O18 - message structure', 'HLA-DRB6 gene', 'Dwarfism NEC in SNOMEDCT', 'STT3A-AS1 gene', 'Amaranthus hybridus', 'Disorders of right acoustic nerve', 'LINC01937 gene', 'Measurement of serum lipid level', 'Structure of mesovarium', 'Granulation of postmastoidectomy cavity, left ear', 'Other specified injury of anterior tibial artery, left leg, subsequent encounter', 'PEX16 gene', 'Breakdown (mechanical) of other prosthetic devices, implants and grafts of genital tract, subsequent encounter', 'Recombinant Beta Chemokine', 'Subcutaneous tissue structure of dorsal surface of fourth toe', 'Stage I Prostate Cancer AJCC v7', 'Open fracture of vault of skull with other and unspecified intracranial hemorrhage, with moderate [1-24 hours] loss of consciousness', 'Amalgam (silver) dental filling material', 'Million per Microliter', 'MLPH gene', 'Lindane:Mass Concentration:Point in time:Whole blood:Quantitative', 'SPINK1 GENE MUTATION ANALYSIS:PRID:PT:BLD/TISS:NAR:MOLGEN', 'Encounter due to problem related to primary support group, unspecified', 'ZNF527 gene', ""Structure of Hertwig's sheath"", 'Closed treatment of proximal humeral (surgical or anatomical neck) fracture; with manipulation, with or without skeletal traction', 'Dayquil Cold & Flu', 'HLA-DRB4 gene', 'isopenicillin N epimerase', 'ZNF652 wt Allele', 'Entire third levator costae', 'Emptying colostomy bag (procedure)', 'Structure of suprachoroidal space', 'Dawson County, NE', 'TTF2 gene', 'NECAB3 gene', 'Other specified injury of superficial palmar arch of right hand, initial encounter', '(all-E) phytoene', 'Thulinia P.J.Cribb, 1985', 'Burn of second degree of left ear [any part, except ear drum], sequela', 'Posterior scrotal branches of internal pudendal artery', 'NAC Substance', 'Entire collateral carpal ulnar ligament', 'Entire retroorbital region', 'Ataxia following other cerebrovascular disease', 'Other subluxation of unspecified radial head, subsequent encounter', 'Fracture of alveolus of left mandible, subsequent encounter for fracture with routine healing', 'Entire lymphatics of larynx', 'Simple electrodesiccation of lesion of penis', 'Other specified fracture of unspecified pubis, subsequent encounter for fracture with nonunion', 'sebaceous gland cell differentiation', 'proton-transporting ATP synthase complex location', 'Anterior subcapsular polar infantile and juvenile cataract, unspecified eye', 'MYO1F wt Allele', 'Preferred Provider Organizations', 'Phrenic nerve paralysis', 'Acetaminophen 160 MG / Dextromethorphan 5 MG Chewable Tablet [Triaminic Softchews Cough & Sore Throat Reformulated Jul 2007]', 'Laceration without foreign body of abdominal wall, right upper quadrant with penetration into peritoneal cavity, subsequent encounter', 'Cryptodiaporthe <Gnomoniaceae>', 'Perforation due to foreign body accidentally left in body following infusion or transfusion, initial encounter', 'CDC7 gene', 'Unspecified physeal fracture of lower end of left femur, initial encounter for closed fracture', 'Potassium Chloride 1.33 MEQ/ML Oral Solution', 'RERG-AS1 gene', 'ELAV-like protein 1', 'Date of onset of chest discomfort:Time Stamp -- Date and Time:Point in time:^Patient:Quantitative', 'MIR1269B gene', 'Benzoic acid allergy', 'FAM90A23P gene', 'Idiopathic retroperitoneal fibrosis', 'Chromosome analysis summary panel:-:Point in time:Whole blood/Tissue, unspecified:-:Molecular Genetics', 'Potassium arsenite', 'Driver of heavy transport vehicle injured in collision with other motor vehicles in nontraffic accident, sequela', 'Unspecified dislocation of unspecified toe(s), sequela', 'OR6L2P gene', 'Small cell B-cell lymphoma, lymph nodes of inguinal region and lower limb', 'Entire apex of nose', 'Pedal cyclist (driver) (passenger) injured in unspecified nontraffic accident, sequela', 'Eustoma grandiflorum', 'Biliary Tract Hemorrhage', 'Ster-zac (with hexachlorophane)', 'Nondisplaced fracture of proximal phalanx of right lesser toe(s), subsequent encounter for fracture with routine healing', 'Viscum album preparation', 'RPL17P7 gene', 'Neural Network Simulation', 'Progeroid Syndrome, Congenital, Petty Type', 'Great cardiac vein structure', 'Ability to Drive', 'GPIHBP1 gene', 'Partial traumatic amputation of right lower leg, level unspecified, sequela', 'Hordeum vulgare Pollen', 'ACADVL gene', 'MIR6785 gene', 'MIR4768 gene', 'Salter-Harris Type II physeal fracture of lower end of unspecified fibula, subsequent encounter for fracture with delayed healing', 'ultrasound of abdomen: multiple spleens', 'Articular cartilage of lateral cuneiform facet of right cuboid bone', 'Open bite of unspecified lesser toe(s) with damage to nail, sequela', 'IL19 gene', 'ferric subsulfate solution', 'RecQ2 Helicase', 'HEMOGLOBIN AURORA PHENOTYPE', 'Continuous invasive mechanical ventilation for less than 96 consecutive hours', 'Bent bone of left radius, subsequent encounter for open fracture type IIIA, IIIB, or IIIC with malunion', 'FMC1 gene', 'Nutrition management', 'Cerebral abscess', 'Manual conversion of position', 'Entire vagus nerve nucleus', 'EFNB1 gene', 'Contusion of oral cavity, initial encounter', 'Structure of interchondral joint', 'Fetal anemia and thrombocytopenia, third trimester, fetus 2', 'Contusion of right ring finger without damage to nail, sequela', 'Ovalocyte Count', 'FANCE wt Allele', '24 HR galantamine hydrobromide 8 MG Extended Release Oral Capsule', 'DPY30 gene', 'Supprelin', 'Transcobalamin', 'Deprecated Streptococcus pneumoniae group B Antigen:Arbitrary Concentration:Point in time:Urine:Ordinal', 'Entire lateral condyle of humerus', 'RPL27P9 gene', 'Other injury of unspecified muscle, fascia and tendon at wrist and hand level, unspecified hand, initial encounter', 'STOMATOCYTOSIS I', 'Busulfex', 'SMCP gene', 'Motorcycle passenger injured in collision with railway train or railway vehicle in nontraffic accident, sequela', 'hemoglobin Maputo', 'Neuhauser syndrome', 'Intramedullary abscess', 'Lipoxygenase Inhibitors', 'Unspecified injury of muscle and tendon of back wall of thorax, subsequent encounter', 'breast mass on palpation', 'ITGB2 wt Allele', 'Derealization', 'Fall from moving wheelchair (powered), initial encounter', 'Problem quality or description:Finding:Point in time:^Patient:Nominal', 'TAAR7P gene', 'Vitamin A Drug Class', 'Primary blast injury of ear, bilateral, subsequent encounter', 'External constriction, left knee, initial encounter', 'Nondisplaced fracture of proximal phalanx of unspecified great toe, subsequent encounter for fracture with nonunion', 'Surgical operation note prep time duration:Time:Duration of procedure:Surgical procedure:Quantitative', 'Erythrokeratodermia variabilis', 'Burn of first degree of unspecified site of left lower limb, except ankle and foot, subsequent encounter', 'Entire superior articular process of fourth cervical vertebra', 'Pathological fracture in neoplastic disease, right radius, initial encounter for fracture', 'PIRC53 gene', 'APOE gene (procedure)', 'Marked operative site:Finding:Point in time:^Patient:Ordinal', ""Albright's hereditary osteodystrophy"", 'proteasome complex location (sensu Eukarya)', 'Genus Defluvibacter', 'Asphyxiation due to being trapped in a car trunk, intentional self-harm, subsequent encounter', 'IGHD5-12 gene', 'Unspecified fracture of right forearm, subsequent encounter for open fracture type IIIA, IIIB, or IIIC with nonunion', 'TTLL8 gene', 'GTSCR1 gene', 'NPM1P34 gene', 'Heart Auricles', 'CONFIRMATORY CONSULTATION FOR A NEW OR ESTABLISHED PATIENT, WHICH REQUIRES THESE THREE KEY COMPONENTS: A PROBLEM FOCUSED HISTORY; A PROBLEM FOCUSED EXAMINATION; AND STRAIGHTFORWARD MEDICAL DECISION MAKING', 'CGB1 gene', 'Transvesical ureterolithotomy (procedure)', 'Water soluble aniline blue stain', 'Superficial foreign body of right eyelid and periocular area, initial encounter', 'Complete traumatic amputation of left forearm, level unspecified, subsequent encounter', 'Elafin', 'FOCAL SEGMENTAL GLOMERULOSCLEROSIS 8', 'Soleichthys heterorhinos', 'Contusion of left lower leg, subsequent encounter', 'Hospital stay duration:Time:Pt:^Patient:Qn', 'LGALS9 gene', 'Other fracture of shaft of left fibula, initial encounter for open fracture type I or II', 'Nondisplaced subtrochanteric fracture of unspecified femur, subsequent encounter for open fracture type IIIA, IIIB, or IIIC with nonunion', 'NLRP3P1 gene', 'SCN2B gene', 'Floor Location', 'Unspecified fracture of shaft of unspecified radius, subsequent encounter for open fracture type IIIA, IIIB, or IIIC with delayed healing', 'Laceration of extensor muscle, fascia and tendon of left index finger at wrist and hand level, sequela', 'Lead-induced gout, unspecified knee', 'Displaced comminuted fracture of shaft of left tibia, subsequent encounter for closed fracture with routine healing', 'Entire cardiac notch of left lung', 'Fourth lumbar vertebra', 'Structure of plantar interosseous muscle of foot', 'Burn of unspecified eyelid and periocular area, sequela', 'Volume of Distribution for Dosing Interval Normalized by Surface Area', 'HYPERCHLOREMIA', 'Guarantor - Disabled Person Code', 'Unspecified fracture of upper end of right radius, subsequent encounter for closed fracture with routine healing', 'Aryl Hydrocarbon Receptor Nuclear Translocator-like 1 Protein', 'Northern Blotting', 'Indicator Device', 'Superficial thrombophlebitis in pregnancy, second trimester', 'murmur left lower sternal border diastolic musical', 'GHRLOS gene', 'Alveolar ridge mucous membrane', 'PAQR4 gene', 'Injection of neurolytic substance, subarachnoid', 'Collapsed vertebra, not elsewhere classified, occipito-atlanto-axial region, initial encounter for fracture', 'Astragali, homeopathic preparation', 'SYNDIG1L gene', 'Fall into other water striking water surface causing other injury, subsequent encounter', 'Displaced fracture of olecranon process with intraarticular extension of right ulna, subsequent encounter for open fracture type I or II with delayed healing', 'earache of right ear', 'Hereditary Opalescent Dentin (disorder)', 'CLSTN2-AS1 gene', 'Endocrinology department', 'Displaced transverse fracture of shaft of left radius, subsequent encounter for closed fracture with delayed healing', 'SPIRE1 gene', 'caspase-10b', 'Nondisplaced osteochondral fracture of unspecified patella, subsequent encounter for open fracture type IIIA, IIIB, or IIIC with nonunion', 'Unspecified injury of unspecified elbow, sequela', 'Nondisplaced transverse fracture of shaft of humerus, right arm, subsequent encounter for fracture with nonunion', 'Destruction, malignant lesion (eg, laser surgery, electrosurgery, cryosurgery, chemosurgery, surgical curettement), trunk, arms or legs; lesion diameter 2.1 to 3.0

saramsey commented 4 years ago

Hi Amy, I think we should filter this node out of the KG. We can do it by modifying the script multi_owl_to_json_kg.py, to do something like this: (at line 200 in the script):

if 'MTH:NOCODE' in nodes_dict:
    del nodes_dict['MTH:NOCODE']

we may also need to silence the warnings in lines 685 and 692 in the script, for the case where 'MTH:NOCODE' == subject_curie_id or the case where 'MTH:NOCODE' == object_curie_id

amykglen commented 4 years ago

Cool, sounds good - will do. (May create a separate issue to document that.)

dkoslicki commented 4 years ago

@amykglen close due to RTXteam/RTX#788?

amykglen commented 4 years ago

I'm pretty sure this is still something we need to address (in a long-term way). (@saramsey would know for sure.)

ecwood commented 4 years ago

Attached is a file containing the error along with the edge it failed on neo4j_buffer_issue_2019.log (it is from 2019).

To keep a record of where I'm looking for information: https://github.com/neo4j/neo4j/issues/11687 https://github.com/neo4j/neo4j/wiki/Neo4j-3.3-changelog (see version 3.3.6)

ecwood commented 4 years ago

I posted an issue report on Neo4j here: https://community.neo4j.com/t/mismatching-store-id-on-neo4j-admin-import/20504

This applies to this issue because I found a way to avoid the buffer overflow issue with --read-buffer-size, but it only works in version 4.0, which is currently not working.

saramsey commented 4 years ago

@amykglen @ericawood OK, is the buffer limit still an issue now that we have fixed RTXteam/RTX#558? Is there a simple test we can run to see if it is still an issue, in the latest KG2 build?

If the buffer limit can be confirmed to no longer be an issue, perhaps the code work-around that @ericawood implemented 11 months ago can be relaxed or removed?

ecwood commented 4 years ago

We can't test on the latest KG2 build without rebuilding the TSV files. Would that be helpful? @saramsey, what are your thoughts, since the data is on your instance?

saramsey commented 4 years ago

I'm doing a build now (on kg2steve2), in order to get HMDB into KG2. Maybe on Monday we can test the TSV rebuild vis-a-vis this issue?

Slating this for Monday at 1 PM PDT, since I have childcare/homeschooling duty Monday morning.

ecwood commented 4 years ago

The new build did not fix this. (I rebuilt the TSV files without the workaround). Please see the attached log file for more information (it was too long to feature in the comment) buffer_overflow_issue7-28-2020.log

ecwood commented 4 years ago

RTX KG2.2.2 is giving the following error: image

ecwood commented 4 years ago

The node in question is this (for reference): image

The node after this node is VERY long. It has a long description. (which I am 90% sure is "PathWhiz.Compound:1134" after querying kg2endpoint, but I can't tell because the description is so long)

ecwood commented 4 years ago

I worked on regenerating the TSV files, such that it would print out the offending node ID and the offending field for nodes above the limit. Here's the output: image

saramsey commented 3 years ago

@ericawood you indicated that you have some code related to this issue, that we will want to preserve. Can you please put that code somehwere, like in a branch or something? Thank you

ecwood commented 3 years ago

@ericawood you indicated that you have some code related to this issue, that we will want to preserve. Can you please put that code somehwere, like in a branch or something? Thank you

Hi @saramsey, That is addressed as best as possible in 0ac4465ebe2b441bc745f2226c6f2214059c5cbe and 4ccb0ee. I have the original v4.0 files from July if these do not work (the three files have changed since then, so I "merged" them to the best of my ability). If you want to go from version 3.0 to 4.0 on the same instance (not recommended), run the following steps:

sudo apt purge neo4j
sudo apt autoremove
./install-neo4j.sh

Please keep in mind that I was also unable to make the database read-only on version 4.0. So, while that line is still there, it will fail.

Please let me know if you have any questions!

saramsey commented 3 years ago

Thank you @ericawood. I think long-term, we should investigate alternatives to Neo4j. In the short-term, what are your recommendations?

I note that in Neo4j 4.2, apparently there is a way to give a user read-only access: https://neo4j.com/docs/operations-manual/current/authentication-authorization/access-control/

ecwood commented 3 years ago

I note that in Neo4j 4.2, apparently there is a way to give a user read-only access: https://neo4j.com/docs/operations-manual/current/authentication-authorization/access-control/

At the top of the page you linked, it says "Enterprise Edition":

image

Based on my knowledge of Neo4j, this means that that user read-only access is only for paid customers of Neo4j.

saramsey commented 1 month ago

Now that we are only dependent on Neo4j at build time and during development/debugging (not at TRAPI query time), would it be easier for us to revisit whether we should upgrade Neo4j to a more modern community edition release?