Closed erwanlc closed 7 years ago
Which setting did you used in following?: 1)Number of batches 2)MaxDegreeOfParallelism 3)BatchSize
I am using the default settings to test the tool: 1)Number of batches = 10 2)MaxDegreeOfParallelism = 1 3)BatchSize = 1000
Did you try to stop the builder then close and start again?
Yes, I did it several times. I also tried to restart from scratch by deleting the builder database without any success. I have the feeling that when I click on running, it doesn't do anything because even for the CDM database creation I need to do it by right click > Create tables. If just I launch the building process, the tables are not created. Is there a way to launch each step manually ?
Okay, I have found the reason. The script for creation builder database is outdated. You need to recreate [Building] table and add [SubChunk] table, scripts below:
DROP TABLE [dbo].[Building];
CREATE TABLE [dbo].[Building]( [Id] [int] NOT NULL, [CreateDestinationDbStart] [datetime] NULL, [CreateDestinationDbEnd] [datetime] NULL, [CreateChunksStart] [datetime] NULL, [CreateChunksEnd] [datetime] NULL, [CreateLookupStart] [datetime] NULL, [CreateLookupEnd] [datetime] NULL, [BuildingStart] [datetime] NULL, [BuildingEnd] [datetime] NULL, [CopyVocabularyStart] [datetime] NULL, [CopyVocabularyEnd] [datetime] NULL, [CreateIndexesStart] [datetime] NULL, [CreateIndexesEnd] [datetime] NULL, [AchillesStart] [datetime] NULL, [AchillesEnd] [datetime] NULL, CONSTRAINT [PK_Building] PRIMARY KEY CLUSTERED ( [Id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]
CREATE TABLE [dbo].[SubChunk]( [ChunkId] [int] NOT NULL, [Index] [int] NOT NULL, [MinPersonId] [bigint] NOT NULL, [MaxPersonId] [bigint] NOT NULL, [PersonCount] [int] NOT NULL, [Saved] [tinyint] NOT NULL CONSTRAINT [DF_SubChunk_Saved] DEFAULT ((0)), CONSTRAINT [PK_SubChunk] PRIMARY KEY CLUSTERED ( [ChunkId] ASC, [Index] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]
GO
ALTER TABLE [dbo].[SubChunk] WITH CHECK ADD CONSTRAINT [FK_SubChunk_Chunk] FOREIGN KEY([ChunkId]) REFERENCES [dbo].[Chunk] ([Id]) ON DELETE CASCADE GO
ALTER TABLE [dbo].[SubChunk] CHECK CONSTRAINT [FK_SubChunk_Chunk] GO
Thanks for your help. I am now able to execute all the steps but nothing is stored in the destination database except the vocabulary. All the CDM tables are empty (there is nothing in the table PERSON for example). Do you think it could be because of the way I stored my data ? Does the DLLs to create the tables for Truven datasets are availables ?
Most likely the the issue with your data and not with the schema. You can easily check, all queries are located here: {Builder folder}\TruvenV5\
1) You need to check {Builder folder}\TruvenV5\Batch.sql - this query used on Chunk creation step inside query your just need to replace {sc} to your schema name dbo or etc. and replace {0} with Number of batches * BatchSize
if query returned no data, so problem is here, take a look to query WHERE clause
2) All queries that are used for getting data for generating CDM tables are here: {Builder folder}\TruvenV5\Definitions\
The query doesn't return any data. Thanks for providing me the path. I will take a look on these files.
No problem. Feel free to ask. With Batch.sql first of all you need to check RX field (must be '1') and DOBYR field (must be < 1900 or > the current year) the both fields are in ENROLLMENT_DETAIL table.
I am testing the first year (2011) of mdcd for now.
I put the last clause where of the BatchMDCD.sql in comments as I had no data because of it:
WITH CTE_SUMMARY
AS (
SELECT ENROLID, MIN(DOBYR) MIN_DOB, MAX(DOBYR) AS MAX_DOB, MIN(ed.SEX) AS MIN_SEX, MAX(ed.SEX) AS MAX_SEX
FROM dbo.ENROLLMENT_DETAIL ed
GROUP BY ENROLID
)
SELECT DISTINCT {0} z.ENROLID, z.ENROLID
FROM dbo.ENROLLMENT_DETAIL z
LEFT JOIN (
SELECT ENROLID, DATEPART(year, MIN(DTSTART)) AS MIN_ENROLL_YEAR
FROM dbo.ENROLLMENT_DETAIL
GROUP BY ENROLID ) m ON m.ENROLID = z.ENROLID
WHERE z.ENROLID IS NOT NULL AND (DRUGCOVG = '1' OR MEDICARE = '1')
AND z.DOBYR BETWEEN 1900 AND DATEPART(year, GETDATE())
AND z.DOBYR <= (m.MIN_ENROLL_YEAR+1)
-- and z.ENROLID NOT IN (
-- SELECT ENROLID
-- FROM CTE_SUMMARY
-- WHERE MIN_DOB != 0 AND MAX_DOB > DATEPART(year, GETDATE()) - 90 AND ABS(MIN_DOB-MAX_DOB) > 2
-- )
ORDER BY z.ENROLID
But now I have this error during the chunks creation:
Builder: GDCPC900208
Time: Sep 12 2017 5:04PM
Error: Exception Found:
Type: System.FormatException
Message: Input string was not in a correct format.
Source: mscorlib
Stacktrace: at System.Number.StringToNumber(String str, NumberStyles options, NumberBuffer& number, NumberFormatInfo info, Boolean parseDecimal)
at System.Number.ParseInt64(String value, NumberStyles options, NumberFormatInfo numfmt)
at System.Convert.ToInt64(String value)
at org.ohdsi.cdm.framework.core.Controllers.ChunkController.<>c.<AddSubChunks>b__8_0(KeyValuePair`2 c) in C:\Users\E613015a\Documents\ETL-CDMBuilder-master\ETL-CDMBuilder-master\source\Framework\org.ohdsi.cdm.framework.core\Controllers\ChunkController.cs:line 51
at System.Linq.EnumerableSorter`2.ComputeKeys(TElement[] elements, Int32 count)
at System.Linq.EnumerableSorter`1.Sort(TElement[] elements, Int32 count)
at System.Linq.OrderedEnumerable`1.<GetEnumerator>d__1.MoveNext()
at System.Linq.Enumerable.WhereSelectEnumerableIterator`2.MoveNext()
at org.ohdsi.cdm.framework.core.Controllers.ChunkController.AddSubChunks(Int32 chunkId, IEnumerable`1 chunk) in C:\Users\E613015a\Documents\ETL-CDMBuilder-master\ETL-CDMBuilder-master\source\Framework\org.ohdsi.cdm.framework.core\Controllers\ChunkController.cs:line 51
at org.ohdsi.cdm.framework.core.Controllers.ChunkController.CreateChunks() in C:\Users\E613015a\Documents\ETL-CDMBuilder-master\ETL-CDMBuilder-master\source\Framework\org.ohdsi.cdm.framework.core\Controllers\ChunkController.cs:line 97
at org.ohdsi.cdm.framework.core.Controllers.BuilderController.PerformAction(Action act) in C:\Users\E613015a\Documents\ETL-CDMBuilder-master\ETL-CDMBuilder-master\source\Framework\org.ohdsi.cdm.framework.core\Controllers\BuilderController.cs:line 80
Which data type you uses for ENROLID? (ENROLLMENT_DETAIL table)
It is REAL, I didn't change what SQL Server suggests. I guess I should have.
I also tried the convertion on the year 2011 of MDCR but I have an error saying it doesn't find the GEOLOC table. Unfortunately I have no clue about what this table about.
EGEOLOC - field from MDCR and not exists in MDCD, MDCD and CCAE\MDCR have bit different schema. You need to use appropriate Vendor for each.
Yes, I noticed, but when using the right vendor for the MDCR I still have this error:
Message: ERROR [42S02] [Microsoft][SQL Server Native Client 11.0][SQL Server]Invalid object name 'dbo.GEOLOC'.
I saw the variable EGEOLOC in the tables of MDCR but here, it seems the tool is looking for a table named GEOLOC and I don't know what it is.
I believe Truven provided the GEOLOC table that defines EGEOLOC (Geographic Location of Employee) that is used to map state. It basically relates a EGEOLOC to a state.
Thanks for this answer. I will ask internally if we have it.
Do you have any idea for the trouble above trouble ? I get it when I try to convert the mdcd dataset:
Builder: GDCPC900208
Time: Sep 12 2017 5:04PM
Error: Exception Found:
Type: System.FormatException
Message: Input string was not in a correct format.
Source: mscorlib
Stacktrace: at System.Number.StringToNumber(String str, NumberStyles options, NumberBuffer& number, NumberFormatInfo info, Boolean parseDecimal)
at System.Number.ParseInt64(String value, NumberStyles options, NumberFormatInfo numfmt)
at System.Convert.ToInt64(String value)
at org.ohdsi.cdm.framework.core.Controllers.ChunkController.<>c.<AddSubChunks>b__8_0(KeyValuePair`2 c) in C:\Users\E613015a\Documents\ETL-CDMBuilder-master\ETL-CDMBuilder-master\source\Framework\org.ohdsi.cdm.framework.core\Controllers\ChunkController.cs:line 51
at System.Linq.EnumerableSorter`2.ComputeKeys(TElement[] elements, Int32 count)
at System.Linq.EnumerableSorter`1.Sort(TElement[] elements, Int32 count)
at System.Linq.OrderedEnumerable`1.<GetEnumerator>d__1.MoveNext()
at System.Linq.Enumerable.WhereSelectEnumerableIterator`2.MoveNext()
at org.ohdsi.cdm.framework.core.Controllers.ChunkController.AddSubChunks(Int32 chunkId, IEnumerable`1 chunk) in C:\Users\E613015a\Documents\ETL-CDMBuilder-master\ETL-CDMBuilder-master\source\Framework\org.ohdsi.cdm.framework.core\Controllers\ChunkController.cs:line 51
at org.ohdsi.cdm.framework.core.Controllers.ChunkController.CreateChunks() in C:\Users\E613015a\Documents\ETL-CDMBuilder-master\ETL-CDMBuilder-master\source\Framework\org.ohdsi.cdm.framework.core\Controllers\ChunkController.cs:line 97
at org.ohdsi.cdm.framework.core.Controllers.BuilderController.PerformAction(Action act) in C:\Users\E613015a\Documents\ETL-CDMBuilder-master\ETL-CDMBuilder-master\source\Framework\org.ohdsi.cdm.framework.core\Controllers\BuilderController.cs:line 80
This issue related with ENROLID, check ENROLLMENT_DETAIL, ENROLID must be INT or BIGINT
Ok, I think I am getting closer. I checked the queries in the folder: {Builder folder}\TruvenV5\Definitions\ and they return some data. The tool seems also to work better as I don't get errors and the process time is longer than before.
However, I still don't have anything in the destination tables after the step 'Conversion to CDM'. For example, my destination table PERSON is empty but the query in ENROLLMENT_DETAIL_MDCD.xml returns some data.
Any idea ?
Did you try the query from ENROLLMENT_DETAIL_MDCD.xml with your real chunkId? I mean did you replaced {0} with existing chunkId? JOIN {sc}._chunks ch ON ch.ChunkId = {0} AND ENROLID = ch.PERSON_ID
Yes I did, I took a chunckId from the table _chunks. And it returns me what I believe should be inserted in the table PERSON. When I compare to what the Truven_MDCD_ETL_CDM_V5.0.1.docx describe, it seems good.
Okay then, person records can be filtered out by three reason:
I just checked and these conditions are not filled. I really don't understand why no data are inserted when the 'Conversion to CDM' step takes more than 30min to execute for 10 batch of 1000.
Can you share 10 latest records from Builder Log table, query below:
SELECT TOP 10 [Type] ,[Time] ,[Message] FROM [dbo].[Log] order by Id desc
Thanks a lot for your time and patience. Here the 10 latest records of the Log table:
The corresponding csv file for complete messages: log.zip
hmm.. I see only OUTPATIENT_SERVICES query in log, can you also check that ENROLLMENT_DETAIL queries were executed, query below.
SELECT count(*) FROM [dbo].[Log] where [Message] like '%ENROLLMENT_DETAIL%'
As you can see, queries were executed:
The last 10 ENROLLMENT_DETAIL queries: log_enrollment_detail.zip
I tested several queries from the log and they all return me data.
Can you try another test, use the same source database and new destination with following setting:
1)Number of batches = 10 2)MaxDegreeOfParallelism = 1 3)BatchSize = 10
Still no results, the logs are here if you want to take a look:
okay, really odd, the same version of cdm builder works well on my side.
lets try following: 1)Number of batches = 1 2)MaxDegreeOfParallelism = 1 3)BatchSize = 1
With these settings, the tool get stucks at 'Chunks creation', it seems to load the step indefinitely.
Are you sure I may not be caused by the data type I choose for my source tables ? I was think that a filter on wrong data format could return an empty result and thus not inserting the converted data in the database.
@bradanton would it make sense for us to release an updated builder on GitHub for @erwanlc to try. We haven't released an update in 4 month.
@erwanlc we will share new version of the cdm builder v 1.9.0.32 later today
Builder db was changed so you need to start from scratch, create new builder db using cdm builder and be careful with setting, new cdm builder contains default setting for Redshift environment. I recommend to use previous settings for tests:
1)Number of batches = 10 2)MaxDegreeOfParallelism = 1 3)BatchSize = 1000
Ok, thanks a lot for your help and time. I will keep your informed after trying the new version.
@erwanlc - @bradanton and I are posted a new copy of Builder to GitHub. That might be another thing that will help. We are asking Truven if we could share the DDL but have not gotten the OK yet.
Thanks a lot for your help, by realoading all my source data in the database and using your last build, it seems to work now ! Anyway, I think sharing the DDL could be a great help for people using this tool. I hope Truven will agree. Thanks again for your time.
@erwanlc, I'm trying to get the DDL to you however Truven is trying to figure our your license. Could you email me directly to try to get this resolved? evoss3 AT its.jnj.com?
I am running ETL-CDMBuilder on Windows Server 2016 to convert Truven MDCR data to CDM format. I managed to create the builder database and the first step 'CDM database creation' but I am stuck at the Chunks creation. I don't have any timelines indication and Errors is also empty so I can't figure out what's going on.
When I look in the table 'Building' from the builder database I only have this:
Does anyone have an idea to solve this issue ?