Parsing vocabulary document #3

Closed jdimyadi closed 5 years ago

jdimyadi commented 5 years ago

1) Executed "legalruleml-thinger-exe.exe db G4AS1_vocabulary.xml DbName UserName Password Port" 2) No error, but nothing was inserted into the Term table.

flaviusb commented 5 years ago

Is it a well formed document? Can you upload the document somewhere so that I can look at it?

Also, have you tried running legalruleml-thinger-exe.exe db -s G4AS1_vocabulary.xml DbName UserName Password Port?

legalruleml-thinger-exe is permissive in that it allows valid xml that it doesn't understand to be in a document, it just doesn't do anything with it. Running it with the -s option will relax some constraints on how it selects which xml nodes to extract information from - it can be useful if the document is a 'tag soup'.

jdimyadi commented 5 years ago

Thanks. No, I haven't tried the -s option. Will try it. It's the same vocabulary document that you have seen.

Are there any more options that I should know about?

flaviusb commented 5 years ago

Not at the moment. All options show up with --help on the relevant sub command (eg legalruleml-thinger-exe.exe db --help), so if any are added in future they will be there.

jdimyadi commented 5 years ago

Using the -s option I could insert the extracted IRI into the Term table. However, only the IRIs were inserted. The atom text was either not extracted from the document or failed to insert into the table.

By the way, --help does not show the -s option.

flaviusb commented 5 years ago

legalruleml-thinger-exe.exe db --help should print out the following:

Usage: legalruleml-thinger-exe db FILE DBNAME USER PASSWORD PORT [-s|--soup]
  Populate a database with information extracted from a LegalRuleML file

Available options:
  -h,--help                Show this help text
  FILE                     File to parse
  DBNAME                   Database to connect to
  USER                     User name (db role)
  PASSWORD                 Password
  PORT                     Port to connect on
  -s,--soup                Try to parse tag soup

Does this show up for you?

jdimyadi commented 5 years ago

Got it. legalruleml -thinger-exe.exe —help show a higher level list, which only list the subcommands print, db, and init

--help on each subcommand does show the options for that command.

flaviusb commented 5 years ago

Cool. When I run

createdb lkm
legalruleml-thinger-exe init lkm USER PASSWORD PORT
legalruleml-thinger-exe db -s G4AS1_vocabulary.xml lkm USER PASSWORD PORT
pg_dump --data-only -O -x lkm

I get back the following output, which shows that atoms have been inserted:

-- PostgreSQL database dump

-- Dumped from database version 10.3
-- Dumped by pg_dump version 10.3

SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET client_min_messages = warning;
SET row_security = off;

-- Data for Name: Formula; Type: TABLE DATA; Schema: public; Owner: -

COPY public."Formula" (id, name, text, children, iri) FROM stdin;

-- Data for Name: Metadata; Type: TABLE DATA; Schema: public; Owner: -

COPY public."Metadata" (id, text) FROM stdin;

-- Data for Name: Statement; Type: TABLE DATA; Schema: public; Owner: -

COPY public."Statement" (id, category, strength, key, formula) FROM stdin;

-- Data for Name: Term; Type: TABLE DATA; Schema: public; Owner: -

COPY public."Term" (id, iri, atom, description) FROM stdin;
1   0175    \n        Adequate\n        \n        Adequate to achieve the objectives of the Building Code.\n      
2   \N  \n        Atmospheric burner\n          \n        A burner system where all the air for combustion is induced by the inspirating effect of a gas injector and/or by natural draught in the combustion chamber without mechanical assistance.\n      
3   0001    \n        Building\n        \n        has the meaning ascribed to it by sections 8 and 9 of the Building Act 2004.\n      
4   \N  \n        Building element\n        \n        Any structural and non-structural component or assembly incorporated into or associated with a building. Included are fixtures, services, drains, permanent mechanical installations for access, glazing, partitions, ceilings and temporary supports.\n      
5   \N  \n        Chimney\n         \n        A non-combustible structure which encloses one or more flues, fireplaces or other heating appliances.\n      
6   \N  \n        Common extract duct\n         \n        A mechanical ventilation duct that extracts from different household units, and may contain air, moisture and contaminant.\n      
7   \N      \N
8   \N  \n        Construct\n       \n        In relation to a building, includes to design, build, erect, prefabricate and relocate the building.\n      
9   \N  \n        Draught diverter\n        \n        A device, without moving parts, fitted in the flue of an appliance for isolating the combustion system from the effects of pressure changes in the secondary flue.\n      
10  0197    \n        Equivalent aerodynamic area\n         \n        The area of an equivalent aerodynamically perfect orifice, and equals the penetration area required by the natural ventilation device multiplied by the discharge coefficient determined under test.\n      
11  \N  \n        Fire separation\n         \n        Any building element which separates firecells or firecells and safe paths, and provides a specific fire resistance rating.\n      
12  \N  \n        Fixture\n         \n        An article intended to remain permanently attached to and form part of a building.\n      
13  0109    \n        Flue\n        \n        The passage through which the products of combustion are conveyed to the outside.\n      
14  \N  \n        Forced or induced draught appliance\n         \n        An appliance where all or part of the air for combustion is provided by a fan or other mechanical device which is an integral part of the combustion system.\n      
15  0028    \n        Habitable space\n         \n        A space used for activities normally associated with domestic living, but excludes any bathroom, laundry, water closet, pantry, walk-in wardrobe, corridor, hallway, lobby, clothes-drying room, or other space of a specialised nature occupied neither frequently nor for extended periods.\n      
16  0019    \n        Household unit\n          \n        a) means any building or group of buildings, or part of a building or group of buildings, that is:\n          i) used, or intended to be used, only or mainly for residential purposes; and\n          ii) occupied, or intended to be occupied, exclusively as the home or residence of not more than one household; but\n        b) does not include a hostel, boarding house or other specialised accommodation.\n      
17  0004    \n        Intended use\n        \n        in relation to a building:\n          a) includes any or all of the following:\n            i) any reasonably foreseeable occasional other use that is not incompatible with the intended use; and\n            ii) normal maintenance; and\n            iii) activities taken in response to fire or any other reasonably foreseeable emergency\n          b) but does not include any other maintenance and repairs or rebuilding.\n      
18  \N  \n        Natural draught\n         \n        The flow produced by the tendency of warmed gases to rise.\n      
19  0017    \n        Net openable area\n       \n        is the area of windows or doors or other opening measured on the face dimensions of the openable building element concerned.\n      
20  0179    \n        Occupied space\n          \n        Any space within a building in which a person will be present from time to time during the intended use of the building.\n      
21  \N  \n        Outdoor air\n         \n        Air as typically comprising by volume:\n          i) oxygen 20.94%\n          ii) carbon dioxide 0.03%\n          iii) nitrogen and other inert gases 79.03%.\n      
22  0025    \n        Passive stack ventilator\n        \n        A system including a ventilation shaft which uses natural draught to ventilate spaces.\n      
23  0196    \n        Permanent opening\n       \n        An opening which cannot be closed, this implies that doors, windows etc are NOT permanent openings, although door undercuts are.\n      
24  \N  \n        Room-sealed appliance\n       \n        An appliance designed so that air for combustion neither enters from, nor combustion products enter into, the room in which the appliance is located.\n      
25  0024    \n        Trickle ventilator\n          \n        A controllable ventilation opening through the external envelope to the outside to provide background ventilation.\n      
26  0174    \n        Spaces\n          \N
27  0002    \n        ventilation\n         \N
28  0003    \n        consistent with their maximum occupancy\n         \N
29  0165    \n        domestic living\n         \N
30  0149    \n      bathroom,\n     \N
31  0155    \n      laundry,\n      \N
32  0154    \n      toilets\n       \n      water closet and other synonyms\n    
33  0166    \n      pantry,\n       \N
34  0167    \n      walk-in wardrobe,\n     \N
35  0168    \n      corridor,\n     \N
36  0169    \n      hallway,\n      \N
37  0170    \n      lobby,\n        \N
38  0171    \n      clothes-drying room, or\n       \N
39  0172    \n      other space of a specialised nature occupied neither frequently\n       \N
40  0173    \n      nor for extended periods.\n     \n      other space of a specialised nature not occupied for extended periods.\n    
41  0005    \n      the air-flow rate (and consequently number of air changes)\n        \N
42  0011    \n      In ducted mechanical ventilation systems\n      \n      mechanical ventilation systems\n    
43  0006    \n      verified using the methods of measurement given in the CIBSE Code Series A, Appendix A3.1.\n        \N
44  0007    \n        For determining the volume of outdoor air,\n          \N
45  0008    \n        measurements shall be taken\n         \n        air quality measurments\n      
46  0176    \n        close to the outdoor air inlet.\n         \n        outdoor air inlet.\n      
47  0147    \n        by demonstrating that contaminant levels\n        \n        air contaminant levels\n      
170 0143    \n        i) flue products discharged to the atmosphere only at the flue terminal,\n        \N
48  0177    \n        do not exceed the limits recommended in “Workplace Exposure Standards and Biological Exposure Indices 7th Edition”.\n         \n        limits recommended in “Workplace Exposure Standards and Biological Exposure Indices 7th Edition”.\n      
49  0148    \n        The acceptability of indoor air purity for workplaces\n       \N
50  0009    \n        by a flow of outdoor air through the building envelope\n          \N
51  0010    \n        natural ventilation (refer to Paragraphs 1.2 and 1.3)\n       \n        natural ventilation\n      
52  0012    \n        a combination of mechanical and natural ventilation (refer to Paragraph 1.4).\n       \n        combination of mechanical and natural ventilation\n      
53  0013    \n        containing Type 5 fire alarm systems\n        \n        Type 5 fire alarm systems\n      
54  0014    \n        installed in kitchens.\n          \n        kitchens\n      
55  0016    \n        small spaces such as hallways and lobbies in household units.\n       \n        small spaces\n      
56  0018    \n        in Commercial and Industrial buildings where products listed in NZBC Clause G4.3.3 are generated\n        \N
57  0178    \n        where there is only one external wall with opening windows (refer to Paragraph 1.3 for additional requirements if natural ventilation is used).\n         \N
58  0015    \n        accommodation units\n         \N
59  0020    \n        constructed in a way that allows them to remain fixed in the open position as a means of ventilation during normal occupancy of the building.\n       \N
60  0021    \n        car parks\n       \N
61  0022    \n        comply with the natural ventilation part of AS 1668.2 Section 7.\n        \N
62  0023    \n        located on the external wall,\n       \N
63  0151    \n        PLACEHOLDER A\n       \n        ventilation requirements for some habitable and not habitable spaces located near an external wall\n      
64  0156    \n        located through the external wall or building elements within the external wall (see Paragraph 1.3.9 for trickle ventilators),\n          \N
65  0150    \n        where the distance between the external wall and opposing wall is less than 6 metres.\n       \N
66  0180    \n        designed to extract a continuous airflow through the surrounding habitable spaces (see Paragraph 1.3.7 for passive stack ventilators),\n          \N
67  0181    \n        located within the external wall or in building elements that are integrated within the external wall (see Paragraph 1.3.9 for trickle ventilators),\n        \N
68  0182    \n        located in building elements that are integrated within the external wall (see Paragraph 1.3.9 for trickle ventilators),\n        \N
69  0183    \n        and not compromising the privacy of the toilet or\n       \N
70  0184    \n        and not compromising the privacy of bathroom,\n       \N
71  0026    \n        permanent openings for airflow between the surrounding habitable spaces of no less than 5% of the combined floor area of the spaces,\n        \N
72  0027    \n        a combined distance between the external wall and furthest opposing wall of less than 10 metres.\n        \N
73  0152    \n        a permanent opening to a kitchen,\n       \N
74  0185    \n        a permanent opening to a bathroom,\n          \N
75  0186    \n        a permanent opening to a toilet or\n          \N
76  0187    \n        a permanent opening to a laundry,\n       \N
77  0030    \n        a distance between the external wall and opposing wall of the habitable spaces of less than 6 metres.\n       \N
78  0031    \n        without openings to the exterior\n        \N
79  0188    \n        via another habitable space\n         \N
80  0189    \n        high level\n          \N
81  0190    \n        low level\n       \N
82  0035    \n        be designed in accordance with AS/NZS 4740 Section 3,\n       \N
83  0036    \n        be designed to achieve extract airflow rates specified in AS 1668.2 Table B1, using the following parameters:\n       \N
84  0191    \n          Air Density r =\n       \N
85  0192    \n        Gravitational Constant g =\n          \N
86  0193    \n        Temperature Differential DT =\n       \N
87  0194    \n        Outside Ambient T =\n         \N
88  0037    \n        without decreasing the performance of the building envelope and\n         \N
89  0195    \n        decreasing the performance the partition walls of the building for external moisture, fire and acoustics,\n       \N
90  0038    \n        be capable of drawing air\n       \N
91  0042    \n        have a condensation trap fitted to the part of the duct above the roof level.\n       \N
92  0041    \n        have ventilation ducts and stacks that are insulated in any unheated areas with a minimum thickness of 25 mm\n        \N
93  0198    \n        have ventilation ducts and stacks that are insulated in any unheated areas of a material having a thermal conductivity of no less than 0.04W/m2K\n        \N
94  0033    \n        have no connections from spaces\n         \N
95  0121    \n        i) maintain the fire separation of the fire separated shaft\n         \N
96  0127    \n        ii) have ducting, downstream of the fire collar, made of non-combustible material, and\n          \N
97  0128    \n        iii) have connections that contain no more than two bends and do not have any duct that is more than 45° to the vertical, and\n       \N
98  0129    \n        iv) have the branch connection to the common duct via a fire shunt of 1800 mm in height (see Figure 1), and\n         \N
99  0130    \n        v) have the fire shunt and the stack located in a fire separated shaft.\n         \N
100 0126    \n        with a pressure-forming intumescent fire collar around a collapsible duct, and\n          \N
101 0124    \n        iv) be ducting made of non-combustible material,\n        \N
102 0034    \n        unless the common extract duct is the only duct in the fire separated shaft.\n        \N
103 0044    \n        greater than the cross-sectional area of the stack,\n         \N
104 0047    \n        have an opening of no less than 2000 mm2 equivalent aerodynamic area,\n       \N
105 0057    \n        door undercut by 20 mm,\n         \N
106 0056    \n        continuous mechanical extract system is installed\n       \N
107 0058    \n        intermittent mechanical extract system is installed\n         \N
108 0059    \n        outdoor air supply shall be designed and equipment installed to comply with NZS 4303, or\n        \N
109 0052    \n        outdoor air supply shall be designed and equipment installed to comply with AS 1668.2 (excluding Table A1 and Sections 3 and 7),\n        \N
110 0120    \n        and to provide outdoor air to occupied spaces at the flow rates given in NZS 4303 Table 2,\n          \N
111 0060    \n        air-handling systems shall be installed and maintained to the requirements of AS/NZS 3666.1 and AS/NZS 3666.2,\n          \N
112 0062    \n        outdoor air intakes shall be located to avoid contamination from any local source in accordance with AS 1668.2 Clause 4.3.1 and NZS 4303 Clause 5.5,\n        \N
113 0063    \n        recirculated air systems shall comply with AS 1668.2 Clause 4.5,\n        \N
114 0064    \n        contaminated air discharge systems shall discharge contaminated air in a way that complies with AS 1668.2 Clause 5.10,\n          \N
115 0065    \n        filtration shall comply with AS 1668.2 Clause 4.4,\n          \N
116 0066    \n        commissioning shall comply with CIBSE Code Series A.\n        \N
117 0162    \n        extract ventilation shall:\n          i) be constructed so that any products listed in Clause G4.3.3 are removed, collected or diluted by ventilation rates and methods set out in AS 1668.2 Section 5\n            COMMENT:\n              Commercial kitchen extract ventilation is included in AS 1668.2 Section 5.\n          \N
118 0122    \n        extract ventilation shall:\n          \N
119 0164    \n        iii) where provided for extract\n         \N
120 0125    \n        refer to Paragraphs 1.5.2 and 1.5.3.\n        \N
121 0069    \n        maintain the fire separation of the fire separated shaft with a pressure-forming intumescent fire collar around a collapsible duct,\n         \N
122 0070    \n        have ducting, downstream of the fire collar, made of non-combustible material,\n          \N
123 0071    \n        have the branch connection to the common extract duct located in a fire separated shaft,\n        \N
124 0072    \n        have the fire shunt and common extract duct located in a separated shaft.\n       \N
125 0074    \n        be installed in a fire separated shaft,\n         \N
126 0135    \n        unless the common extract duct is the only duct in the fire separated shaft.\n        \N
127 0076    \n        be ducting made of non-combustible material,\n        \N
128 0077    \n        comply with the mechanical ventilation part of AS 1668.2 Section 7.\n         \N
129 0078    \n        incorporating filtration\n        \N
130 0079    \n        maintained at a positive pressure.\n          \N
131 0080    \n        remove or collect contaminants\n          \N
132 0081    \n        maintained at negative pressure relative to other spaces in the building.\n       \N
133 0061    \n        Supply air under equal pressure conditions to the burners and\n       \N
134 0084    \n        Supply air under equal pressure conditions to the draught diverter\n          \N
135 0082    \n        for appliances burning gas fuel\n         \N
136 0083    \n        designed to operate under natural draught conditions\n        \N
137 0068    \n        For non room-sealed appliances having a combined gas input exceeding 1 kW for each m3 of the space in which they are installed,\n         \N
138 0085    \n        be provided with vents, in addition to the ventilation required by Paragraphs 1.1 and 1.2. The vents shall be sized and located according to Paragraphs 2.1.3 to 2.1.8.\n         \N
139 0086    \n        Domestic gas cookers in non room-sealed spaces\n          \N
140 0087    \n        also used for sleeping\n          \N
141 0088    \n        permanent venting to the outside.\n       \n        Vents\n      
142 0089    \n        appropriate to the gas input to the cooker\n          \N
143 0090    \n        be subject to specific design\n       \N
144 0073    \n        for spaces vented directly to the outside,\n          \N
145 0093    \n        each with a free ventilation area per kW of gas input (of all appliances in the space) of no less than:\n          a) 1200 mm2\n          \N
146 0136    \n        for spaces vented via adjacent spaces.\n          \N
147 0094    \n        plant rooms and boiler rooms infrequently occupied by people.\n       \N
148 0095    \n        The vent opening areas given in Paragraph 2.1.3 may be halved\n       \N
149 0096    \n        vertical dimensions of no less than 50 mm\n       \N
150 0097    \n        no dimension of less than 6.0 mm in any other direction.\n        \N
151 0098    \n        have their lower edge no more than 100 mm above floor level\n         \N
152 0099    \n        have their lower edge no less than 75 mm above the top of the draught diverter relief opening.\n          \N
153 0100    \n        A louvred door\n          \N
154 0101    \n        the bottom of the free area extends to not less than 100 mm above the floor,\n        \N
155 0102    \n        the requisite high-level free area is available from the level of 75 mm above the draught diverter relief opening.\n          \N
156 0103    \n        provided it reaches from floor to ceiling\n       \N
157 0104    \n        has a total free area equivalent to that required for the two separate vents.\n       \N
158 0105    \n        Mechanical supply with mechanical extraction,\n       \N
159 0106    \n        Mechanical supply with natural exhaust.\n         \N
160 0138    \n        for forced or induced draught appliances, and\n       \N
161 0107    \n        For each kW of gas consumption (of all appliances in the plant room) provide outdoor air at the rate of:\n          i) 3.6 m3/h\n         \N
162 0139    \n        for appliances with atmospheric burners,\n        \N
163 0108    \n        Remove exhaust air from the room either:\n          i) mechanically at one third the inlet rate, or\n         \N
164 0141    \n        Remove exhaust air from the room either:\n          ii) naturally via high-level openings having a free ventilation area of no less than 600 mm2 per kW of total gas consumption for all appliances in the room.\n        \N
165 0110    \n        The cross-sectional area of a natural draught flue system external to the appliances,\n       \N
166 0075    \n        no less than the cross-sectional area of the appliance outlet,\n          \N
167 0111    \n        The flue designed to comply with AS/NZS 5601.1, section 6.7 and Appendix H, and (Amend 3 Feb 2014)\n          \N
168 0112    \n        If a draught diverter is not fitted:\n        \N
169 0142    \n        unless the discharge at other locations can be achieved without hazard to persons, property or appliance operation,\n         \N
171 0144    \n        ii) a method of automatically shutting down the main burners of forced or induced draught appliances, should the normal free discharge of the flue be interrupted.\n          \N
172 0113    \n        Draught diverter installations shall discharge the total\n        \N
173 0114    \n        including excess air and draught diverter dilution air, at the flue terminal without spillage from the skirt of the draught diverter.\n       \N
174 0137    \n        on a dwelling\n       \N
175 0115    \n        Outlets from natural draught flues or chimneys, positioned relative to surrounding construction to avoid wind causing down draughts in the flue,\n        \N
176 0116    \n        Flue pipes which extend through the roof, terminated no closer than:\n          i) 500 mm to the nearest part of any roof,\n          \N
177 0145    \n        Flue pipes which extend through the roof, terminated no closer than:\n          ii) 2.0 m to the roof level of a flat roof intended for personal or public use, and\n         \N
178 0146    \n        Flue pipes which extend through the roof, terminated no closer than:\n          iii) 500 mm above any parapet,\n          \N
179 0117    \n        Flues which terminate on the wall of a building located clear of inlets for outside air in accordance with the minimum clearances specified in AS/NZS 5601.1, section 6.9 and Figure 6.2.\n       \N
180 0119    \n        AS/NZS 5601.1 Sections 1, 3, 4, 5 and 6 and Appendices A – M and O - R is an Acceptable Solution, but may exceed the performance criteria of NZBC G4.\n       \N

-- Name: Formula_id_seq; Type: SEQUENCE SET; Schema: public; Owner: -

SELECT pg_catalog.setval('public."Formula_id_seq"', 1, false);

-- Name: Metadata_id_seq; Type: SEQUENCE SET; Schema: public; Owner: -

SELECT pg_catalog.setval('public."Metadata_id_seq"', 1, false);

-- Name: Statement_id_seq; Type: SEQUENCE SET; Schema: public; Owner: -

SELECT pg_catalog.setval('public."Statement_id_seq"', 1, false);

-- Name: Term_id_seq; Type: SEQUENCE SET; Schema: public; Owner: -

SELECT pg_catalog.setval('public."Term_id_seq"', 180, true);

-- PostgreSQL database dump complete

Could you include the output that you get when you run those commands?

jdimyadi commented 5 years ago

I can confirm that both atom and description texts are displayed in the pg_dump. I could also query using psql, so the content is definitely there. Obviously a bug in the pgAdmin UI where only the id and IRI are displayed but nothing else.

flaviusb commented 5 years ago

It could be a problem with the newlines in the atoms - a lot of the atoms start with a newline and a bunch of whitespace, and the pgAdmin might only display the first line of a cell by default. I am deliberately inserting the verbatim atom as I get it from the xml, in case we ever get a situation where the whitespace matters. If the whitespace does not matter it can be trivially trimmed in the application layer, or with a stored procedure.