Dan2552 / snofyre

Automatically exported from code.google.com/p/snofyre
0 stars 0 forks source link

Data generation throws data truncation error #16

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Data generation using refinements on expressions.
2. Throws data too long for column error.

C:\Program Files\Snofyre\bin\queries\allergicDisorder.xml
[DEBUG] 07 Jan 2011 14:55:04 
(uk.nhs.cfh.dsp.srth.desktop.modules.querycreationtreepanel.QueryAuthoringTreePa
nel:459) - Notified model of query change
[ INFO] 07 Jan 2011 14:55:35 
(uk.nhs.cfh.dsp.srth.simulator.engine.impl.QueryBasedDataGenerationEngineImpl:12
6) - Starting data generation from data map
[ WARN] 07 Jan 2011 14:55:35 
(uk.nhs.cfh.dsp.srth.expression.repository.impl.ExpressionMappingObjectDAOImpl:2
87) - Found more than one match. Returning the first match found
[ INFO] 07 Jan 2011 14:55:35 
(uk.nhs.cfh.dsp.srth.simulator.engine.impl.QueryBasedDataGenerationEngineImpl:24
4) - Saved patient with ID : 475
[ WARN] 07 Jan 2011 14:55:36 
(uk.nhs.cfh.dsp.srth.expression.repository.impl.ExpressionMappingObjectDAOImpl:2
87) - Found more than one match. Returning the first match found
[ INFO] 07 Jan 2011 14:55:36 
(uk.nhs.cfh.dsp.srth.simulator.engine.impl.QueryBasedDataGenerationEngineImpl:24
4) - Saved patient with ID : 638
[ WARN] 07 Jan 2011 14:55:37 
(uk.nhs.cfh.dsp.srth.expression.repository.impl.ExpressionMappingObjectDAOImpl:2
87) - Found more than one match. Returning the first match found
[ INFO] 07 Jan 2011 14:55:37 
(uk.nhs.cfh.dsp.srth.simulator.engine.impl.QueryBasedDataGenerationEngineImpl:24
4) - Saved patient with ID : 277
[ WARN] 07 Jan 2011 14:55:38 (org.hibernate.util.JDBCExceptionReporter:357) - 
SQL Error: 0, SQLState: 22001
[ERROR] 07 Jan 2011 14:55:38 (org.hibernate.util.JDBCExceptionReporter:454) - 
Data truncation: Data too long for column 'nfe_cgf' at row 1
[ WARN] 07 Jan 2011 14:55:52 
(uk.nhs.cfh.dsp.srth.desktop.modules.simulator.viewcomponent.actions.DataGenerat
orTask:105) - Error creating data. Nested message is could not insert: 
[uk.nhs.cfh.dsp.srth.expression.repository.om.impl.ExpressionMappingObjectImpl];
 nested exception is org.hibernate.exception.DataException: could not insert: 
[uk.nhs.cfh.dsp.srth.expression.repository.om.impl.ExpressionMappingObjectImpl]N
ested exception is : could not insert: 
[uk.nhs.cfh.dsp.srth.expression.repository.om.impl.ExpressionMappingObjectImpl];
 nested exception is org.hibernate.exception.DataException: could not insert: 
[uk.nhs.cfh.dsp.srth.expression.repository.om.impl.ExpressionMappingObjectImpl]

Original issue reported on code.google.com by snofyre....@gmail.com on 16 Jan 2011 at 3:29

GoogleCodeExporter commented 8 years ago
Example: generating all patients who have 234347009|Anemia of chronic disease|

The long normal form for this concept (before any further random 
subspecialisation by adding e.g. optional qualifiers) is 212 characters long:

234347009|Anemia of chronic disease|
Equals 
38292009:47429007=(64572001:263502005=90734009),363705008=(62574001:{363713009=2
81300000,363714003=(14089001:246093002=41898006,116686009=(123038009:{118169006=
41898006,370133003=33463005}),260686004=129266000)})

Original comment by jeremy.r...@googlemail.com on 17 Jan 2011 at 2:29

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
This is a known issue because the field storing the Normal Form was set to 
VARCHAR(256) to start with. This is the max length of VARCHAR datatype 
supported in MySQL upto version 5.0.3. It can be increased now, because it has 
been addressed in versions 5.0.3 and over. See below from MySQL documentation: 
http://dev.mysql.com/doc/refman/5.0/en/string-type-overview.html

 [NATIONAL] VARCHAR(M) [CHARACTER SET charset_name] [COLLATE collation_name]

A variable-length string. M represents the maximum column length in characters. 
In MySQL 5.0, the range of M is 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 
MySQL 5.0.3 and later. The effective maximum length of a VARCHAR in MySQL 5.0.3 
and later is subject to the maximum row size (65,535 bytes, which is shared 
among all columns) and the character set used. For example, utf8 characters can 
require up to three bytes per character, so a VARCHAR column that uses the utf8 
character set can be declared to be a maximum of 21,844 characters.

Original comment by snofyre....@gmail.com on 17 Jan 2011 at 4:08

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
I've now increased the column length for any expression or derivative store in 
compositional grammar form to the max value of 65535. This might be overkill 
for most cases, but at at we shouldn't have any hidden data truncation error 
surprises!

To install the patch:
1. Drop the attached jars into your Snofyre/bin/lib folder (select yes when 
warned about overwrite)
2. You now have two options
  # If you are happy to lose your existing data, then change open the Snofyre/bin/settings/fake-data.properties file and change line 16 from hibernate.hbm2ddl.auto=validate to hibernate.hbm2ddl.auto=create   Now when you start Snofyre, it'll recreate the new schema for you (give it a few mins..for all UI to load). Remember to revert line 16 to hibernate.hbm2ddl.auto=validate after the new schema has been created. 
  # If you want to retain your existing data, then from MySQL Query Browser you can edit the schemas for the following tables to unique_expressions_table, clinical_expressions, CLINICAL_ENTRY and CLINICAL_ENTITY to replace all VARCHAR(255) with VARCHAR(65535)

Let me know how it goes... :)

Original comment by jay.kola on 18 Jan 2011 at 4:09

GoogleCodeExporter commented 8 years ago
Doesn't work: I dropped the original fakedata2 and then re-created it again as 
an empty. After starting SnoFyre with the 'create' option, it populated the 
schema with a load of tables EXCEPT the four mentoned above. I then loaded the  
MI sample query and tried to generate against it, and got the same error 
'nesting problem' message as previously. See attached log.

Original comment by jeremy.r...@googlemail.com on 18 Jan 2011 at 4:50

Attachments:

GoogleCodeExporter commented 8 years ago
Previous fix does not work because the it exceeds the max limit on row size... 
now all appropriate datatypes mapped to VARCHAR(2560)...

Same instructions as before, but if u drop the fakedata2 schema, please 
recreate it yourself

Original comment by jay.kola on 18 Jan 2011 at 5:50

Attachments: