OpenDataAlex / PDI-DC-Auto-Profiling

Tool built using Pentaho Data Integrator and the DataCleaner plugin to automate data profiling
4 stars 1 forks source link

Issue with profile_unknown_record #10

Closed marvinlb closed 11 years ago

marvinlb commented 11 years ago

INSERT INTO profile_source_table_number_analysis (profile_source_table_pk, table_column_name, w_create_dt) VALUES (0, 'unknown', NOW())

Error Code: 1054. Unknown column 'profile_source_table_pk' in 'field list'

OpenDataAlex commented 11 years ago

Hi Marvin,

I've modified the script - thanks for adding the ticket.

marvinlb commented 11 years ago

new error:

INSERT INTO profile_source_table_number_analysis (profile_source_table_column_pk, table_column_name, w_create_dt) VALUES (0, 'unknown', NOW())

Error Code: 1054. Unknown column 'table_column_name' in 'field list'

marvinlb commented 11 years ago

Alex- new error:

09:43:54 INSERT INTO profile_source_table_number_analysis (profile_source_table_column_pk, table_column_name, w_create_dt) VALUES (0, 'unknown', NOW()) Error Code: 1054. Unknown column 'table_column_name' in 'field list' 0.140 sec

(Don't know whether to open a new ticket, or if/how I can reopen the closed ticket. Instead, I added this in github as a comment on the closed ticket)

Thank you- Marvin

On Tue, Feb 12, 2013 at 10:39 PM, Alex Meadows notifications@github.comwrote:

Hi Marvin,

I've modified the script - thanks for adding the ticket.

— Reply to this email directly or view it on GitHubhttps://github.com/dbaAlex/PDI-DC-Auto-Profiling/issues/10#issuecomment-13473133.

OpenDataAlex commented 11 years ago

Hi Marvin,

Apologies - I should have caught that last night! This script was out
of sync with the latest version of the database. I went through and I
think I have resolved the issue. Thank you for reporting the bugs so
they can get resolved.

Talk to you soon,

Alex

Quoting marvinlb notifications@github.com:

Alex- new error:

09:43:54 INSERT INTO profile_source_table_number_analysis (profile_source_table_column_pk, table_column_name, w_create_dt) VALUES (0, 'unknown', NOW()) Error Code: 1054. Unknown column 'table_column_name' in 'field list' 0.140 sec

(Don't know whether to open a new ticket, or if/how I can reopen the closed ticket. Instead, I added this in github as a comment on the closed ticket)

Thank you- Marvin

On Tue, Feb 12, 2013 at 10:39 PM, Alex Meadows
notifications@github.comwrote:

Hi Marvin,

I've modified the script - thanks for adding the ticket.

— Reply to this email directly or view it on
GitHubhttps://github.com/dbaAlex/PDI-DC-Auto-Profiling/issues/10#issuecomment-13473133.


Reply to this email directly or view it on GitHub: https://github.com/dbaAlex/PDI-DC-Auto-Profiling/issues/10#issuecomment-13496791

marvinlb commented 11 years ago

Now:

14:33:58 INSERT INTO profile_source_table_number_analysis (profile_source_table_column_pk, w_create_dt) VALUES (0, NOW()) Error Code: 1364. Field 'geometric_mean' doesn't have a default value 0.281 sec

OpenDataAlex commented 11 years ago

Hi again,

The tables that end with _analysis will need to be rebuilt. I have updated the schema file with the correct DDL. Please let me know if that resolves the issue.

marvinlb commented 11 years ago

Am I doing something wrong?

On Wed, Feb 13, 2013 at 2:46 PM, Alex Meadows notifications@github.comwrote:

Hi again,

The tables that end with _analysis will need to be rebuilt. I have updated the schema file with the correct DDL. Please let me know if that resolves the issue.

— Reply to this email directly or view it on GitHubhttps://github.com/dbaAlex/PDI-DC-Auto-Profiling/issues/10#issuecomment-13514083.

OpenDataAlex commented 11 years ago

Let's start from scratch :) Run the first and second scripts again after clearing out the schema and let's see what happens.

marvinlb commented 11 years ago
  1. Dropped schema
  2. profile_schema
  3. Ran script
    • result: OK
    • profile_unknown_record
  4. Ran script
  5. result is: 15:38:38 INSERT INTO profile_source_table_number_analysis (profile_source_table_column_pk, w_create_dt) VALUES (0, NOW()) Error Code: 1364. Field 'geometric_mean' doesn't have a default value 0.000 sec
  6. Figured I'd start over again, in case something else had changed
  7. Dropped schema
  8. profile_schema
  9. Ran script
    • Script error
    • found and removed "L" near friday, towards end of script (in my copy, not on github)
    • ran revised script
    • result: OK
  10. profile_unknown_record
    • Ran script
  11. result:

Cannot add or update a child row: a foreign key constraint fails (profile_app.profile_source_table_number_analysis, CONSTRAINT pst_column_pst_number_analysis_fk FOREIGN KEY (profile_source_table_column_pk) REFERENCES profile_source_table_column (`p) 0.016 sec

Marvin

On Wed, Feb 13, 2013 at 3:32 PM, Alex Meadows notifications@github.comwrote:

Let's start from scratch :) Run the first and second scripts again after clearing out the schema and let's see what happens.

— Reply to this email directly or view it on GitHubhttps://github.com/dbaAlex/PDI-DC-Auto-Profiling/issues/10#issuecomment-13516521.

OpenDataAlex commented 11 years ago

That's good news - with the next script fix, the unknown record script should work...

marvinlb commented 11 years ago

Alex- I will try in the AM. If still an issue, will let you know.

If second script executes successfully, what is next? We'd only discussed my "to-do's" through the 2 scripts. thx- Marvin

On Wed, Feb 13, 2013 at 8:52 PM, Alex Meadows notifications@github.comwrote:

That's good news - with the next script fix, the unknown record script should work...

— Reply to this email directly or view it on GitHubhttps://github.com/dbaAlex/PDI-DC-Auto-Profiling/issues/10#issuecomment-13529326.

OpenDataAlex commented 11 years ago

Hi Marvin,

I modified the script. The next steps will be to install Pentaho Data Integrator and start configuring it.

marvinlb commented 11 years ago

Pdi 4.4 is already installed. MB

Marvin Bleiberg On Feb 13, 2013 9:13 PM, "Alex Meadows" notifications@github.com wrote:

Hi Marvin,

I modified the script. The next steps will be to install Pentaho Data Integrator and start configuring it.

— Reply to this email directly or view it on GitHubhttps://github.com/dbaAlex/PDI-DC-Auto-Profiling/issues/10#issuecomment-13529851.

OpenDataAlex commented 11 years ago

https://github.com/dbaAlex/PDI-DC-Auto-Profiling/wiki/Quick-Start-Guide ;) Working on this right now.

marvinlb commented 11 years ago

Alex- Script ran successfully Marvin

On Wed, Feb 13, 2013 at 9:13 PM, Alex Meadows notifications@github.comwrote:

Hi Marvin,

I modified the script. The next steps will be to install Pentaho Data Integrator and start configuring it.

— Reply to this email directly or view it on GitHubhttps://github.com/dbaAlex/PDI-DC-Auto-Profiling/issues/10#issuecomment-13529851.

marvinlb commented 11 years ago

OK! So are these my next steps?

  1. In addition to the scripts, xmlstarlet must be installed. This is to work around an issue with DataCleaner 2.5.1. The tool will be upgraded for DataCleaner for 3.x, which resolves several issues.
  2. The conf_template.xml needs to be stored in the DataCleaner install folder.
  3. PDI kettle.properties properties
    • There are some added kettle properties to add to your system:

MB

On Thu, Feb 14, 2013 at 9:18 AM, Alex Meadows notifications@github.comwrote:

https://github.com/dbaAlex/PDI-DC-Auto-Profiling/wiki/Quick-Start-Guide;) Working on this right now.

— Reply to this email directly or view it on GitHubhttps://github.com/dbaAlex/PDI-DC-Auto-Profiling/issues/10#issuecomment-13551061.

OpenDataAlex commented 11 years ago

Yep yep :)

marvinlb commented 11 years ago

Alex- in-line questions, in red:

  1. In addition to the scripts, xmlstarlet must be installed. This is to work around an issue with DataCleaner 2.5.1. The tool will be upgraded for DataCleaner for 3.x, which resolves several issues. I have DC 3.x - what do I do?
  2. The conf_template.xml needs to be stored in the DataCleaner install folder. OK, I can do that. Is there one (and only one) folder named "install" in DC? And I just drop the conf_template.xml file in that subdir?
  3. PDI kettle.properties properties

There are some added kettle properties to add to your system: I have PDI 4.4 installed. There are 5 rows in your table. Not sure what to do with them - where do these go, what do I do, etc.

thx- MB

On Thu, Feb 14, 2013 at 9:59 AM, Alex Meadows notifications@github.comwrote:

Yep yep :)

— Reply to this email directly or view it on GitHubhttps://github.com/dbaAlex/PDI-DC-Auto-Profiling/issues/10#issuecomment-13552993.

OpenDataAlex commented 11 years ago

Marvin,

  1. Are you running on Windows or Linux?
  2. The conf_template.xml belongs in the folder where Data Cleaner was placed. So for instance, if Data Cleaner lives in DataCleaner, you'll want to put the file there (DataCleaner/conf_template.xml).
  3. Those five lines need to be copied into your local kettle.properties file. They can go at the bottom of the list.
marvinlb commented 11 years ago

in-line- MB

On Thu, Feb 14, 2013 at 11:20 AM, Alex Meadows notifications@github.comwrote:

Marvin,

  1. Are you running on Windows or Linux? Windows 7
  2. The conf_template.xml belongs in the folder where Data Cleaner was placed. So for instance, if Data Cleaner lives in DataCleaner, you'll want to put the file there (DataCleaner/conf_template.xml). OK
  3. Those five lines need to be copied into your local kettle.properties file. They can go at the bottom of the list. This is the table (below). Am I adding the contents of the "Property" column, the "Value" column, or both together? Afraid I don't have the coding experience such that it's second nature.
  4. Property

    Description

    Value

    data_cleaner_filename_par

    Default filename used for DataCleaner files

    ${etl_file_extract_location_par}/quality/${profile_source_name_par}/${profile_source_table_name_par}/${profile_source_table_namepar}${profile_sampling_query_pk_par}.csv

    etl_file_extract_location_par

    Location where extracts are to be stored.

    data_profile_settings_location_par

    Place where the profile_customization folder is stored

    data_cleaner_location_par

    Place where DataCleaner is installed.

    etl_source_location_par

    Place where etl code is stored.

    1.

    — Reply to this email directly or view it on GitHubhttps://github.com/dbaAlex/PDI-DC-Auto-Profiling/issues/10#issuecomment-13561178.

OpenDataAlex commented 11 years ago

Hi Marvin,

Since you're using Windows, you will need to install cygwin (http://www.cygwin.com/) and then install xmlstarlet on top of that. The entire contents of the ap_kettle.properties file needs to be copied over.

marvinlb commented 11 years ago

Alex-

  1. There seems to be a Windows version of xmlstarlet; that is not acceptable (instead of cygwin + Linux xmlstarlet)?
  2. Do I need all of cygwin, or only certain portions?
  3. Sorry, don't understand "_The entire contents of the apkettle.properties file needs to be copied over."

thank you- Marvin

On Thu, Feb 14, 2013 at 2:18 PM, Alex Meadows notifications@github.comwrote:

Hi Marvin,

Since you're using Windows, you will need to install cygwin ( http://www.cygwin.com/) and then install xmlstarlet on top of that. The entire contents of the ap_kettle.properties file needs to be copied over.

— Reply to this email directly or view it on GitHubhttps://github.com/dbaAlex/PDI-DC-Auto-Profiling/issues/10#issuecomment-13573039.

OpenDataAlex commented 11 years ago

That should work fine, though I've only tested the Linux version. I'm working on removing that requirement for the next release since the underlying Data Cleaner bugs have been fixed in the 3.x line.

The defaults for cygwin should suffice :)

Just copy the five lines in ap_kettle.properties into your kettle.properties file. As has been requested by others, I'll be working on an installer application to make all of this much more streamlined and user friendly.

marvinlb commented 11 years ago

Alex- I now "have" cygwin and xmlstarlet (I see a .bat file in cygwin; when I run it, a windowopens up). I've downloaded and unzipped xmlstarlet. I'm pretty sure I'm supposed to "do something" with xmlstarlet, while being "in" the other. Beyond that I am presently clueless. Can you direct me? thx- MB

On Sat, Feb 16, 2013 at 11:15 PM, Alex Meadows notifications@github.comwrote:

That should work fine, though I've only tested the Linux version. I'm working on removing that requirement for the next release since the underlying Data Cleaner bugs have been fixed in the 3.x line.

The defaults for cygwin should suffice :)

Just copy the five lines in ap_kettle.properties into your kettle.properties file. As has been requested by others, I'll be working on an installer application to make all of this much more streamlined and user friendly.

— Reply to this email directly or view it on GitHubhttps://github.com/dbaAlex/PDI-DC-Auto-Profiling/issues/10#issuecomment-13680528.

OpenDataAlex commented 11 years ago

Having xmlstarlet installed should be enough. You don't have to do
anything directly with it as the tool knows how to handle it. Please
note I have not tested it with Windows, so there might be an issue.
Also, I'm working on removing the need for xmlstarlet in the next big
release.

Take care,

Alex

Quoting marvinlb notifications@github.com:

Alex- I now "have" cygwin and xmlstarlet (I see a .bat file in cygwin; when I run it, a windowopens up). I've downloaded and unzipped xmlstarlet. I'm pretty sure I'm supposed to "do something" with xmlstarlet, while being "in" the other. Beyond that I am presently clueless. Can you direct me? thx- MB

On Sat, Feb 16, 2013 at 11:15 PM, Alex Meadows
notifications@github.comwrote:

That should work fine, though I've only tested the Linux version. I'm working on removing that requirement for the next release since the underlying Data Cleaner bugs have been fixed in the 3.x line.

The defaults for cygwin should suffice :)

Just copy the five lines in ap_kettle.properties into your kettle.properties file. As has been requested by others, I'll be working on an installer application to make all of this much more streamlined and user friendly.

— Reply to this email directly or view it on
GitHubhttps://github.com/dbaAlex/PDI-DC-Auto-Profiling/issues/10#issuecomment-13680528.


Reply to this email directly or view it on GitHub: https://github.com/dbaAlex/PDI-DC-Auto-Profiling/issues/10#issuecomment-13790270

marvinlb commented 11 years ago

How do I install xmlstarlet? MB

Marvin Bleiberg On Feb 20, 2013 8:11 PM, "Alex Meadows" notifications@github.com wrote:

Having xmlstarlet installed should be enough. You don't have to do anything directly with it as the tool knows how to handle it. Please note I have not tested it with Windows, so there might be an issue. Also, I'm working on removing the need for xmlstarlet in the next big release.

Take care,

Alex

Quoting marvinlb notifications@github.com:

Alex- I now "have" cygwin and xmlstarlet (I see a .bat file in cygwin; when I run it, a windowopens up). I've downloaded and unzipped xmlstarlet. I'm pretty sure I'm supposed to "do something" with xmlstarlet, while being "in" the other. Beyond that I am presently clueless. Can you direct me? thx- MB

On Sat, Feb 16, 2013 at 11:15 PM, Alex Meadows notifications@github.comwrote:

That should work fine, though I've only tested the Linux version. I'm working on removing that requirement for the next release since the underlying Data Cleaner bugs have been fixed in the 3.x line.

The defaults for cygwin should suffice :)

Just copy the five lines in ap_kettle.properties into your kettle.properties file. As has been requested by others, I'll be working on an installer application to make all of this much more streamlined and user friendly.

— Reply to this email directly or view it on GitHub< https://github.com/dbaAlex/PDI-DC-Auto-Profiling/issues/10#issuecomment-13680528 .


Reply to this email directly or view it on GitHub:

https://github.com/dbaAlex/PDI-DC-Auto-Profiling/issues/10#issuecomment-13790270

— Reply to this email directly or view it on GitHubhttps://github.com/dbaAlex/PDI-DC-Auto-Profiling/issues/10#issuecomment-13867183.

marvinlb commented 11 years ago

In addition to the scripts, xmlstarlet must be installed. This is to work around an issue with DataCleaner 2.5.1. The tool will be upgraded for DataCleaner for 3.x, which resolves several issues.

Should I just wait for the upgraded version, since I do have DC 3.1 installed? thank you- Marvin

On Thu, Feb 14, 2013 at 9:44 AM, Marvin Bleiberg mbleiberg@emelby.comwrote:

OK! So are these my next steps?

  1. In addition to the scripts, xmlstarlet must be installed. This is to work around an issue with DataCleaner 2.5.1. The tool will be upgraded for DataCleaner for 3.x, which resolves several issues.
  2. The conf_template.xml needs to be stored in the DataCleaner install folder.
  3. PDI kettle.properties properties
    • There are some added kettle properties to add to your system:

MB

On Thu, Feb 14, 2013 at 9:18 AM, Alex Meadows notifications@github.comwrote:

https://github.com/dbaAlex/PDI-DC-Auto-Profiling/wiki/Quick-Start-Guide;) Working on this right now.

— Reply to this email directly or view it on GitHubhttps://github.com/dbaAlex/PDI-DC-Auto-Profiling/issues/10#issuecomment-13551061.

marvinlb commented 11 years ago

Hey, Alex- Touching base - any progress on the upgrade for DC 3.x? thank you- Marvin

On Thu, Feb 21, 2013 at 4:01 PM, Marvin Bleiberg mbleiberg@emelby.comwrote:

In addition to the scripts, xmlstarlet must be installed. This is to work around an issue with DataCleaner 2.5.1. The tool will be upgraded for DataCleaner for 3.x, which resolves several issues.

Should I just wait for the upgraded version, since I do have DC 3.1 installed? thank you- Marvin

On Thu, Feb 14, 2013 at 9:44 AM, Marvin Bleiberg mbleiberg@emelby.comwrote:

OK! So are these my next steps?

  1. In addition to the scripts, xmlstarlet must be installed. This is to work around an issue with DataCleaner 2.5.1. The tool will be upgraded for DataCleaner for 3.x, which resolves several issues.
  2. The conf_template.xml needs to be stored in the DataCleaner install folder.
  3. PDI kettle.properties properties
    • There are some added kettle properties to add to your system:

MB

On Thu, Feb 14, 2013 at 9:18 AM, Alex Meadows notifications@github.comwrote:

https://github.com/dbaAlex/PDI-DC-Auto-Profiling/wiki/Quick-Start-Guide;) Working on this right now.

— Reply to this email directly or view it on GitHubhttps://github.com/dbaAlex/PDI-DC-Auto-Profiling/issues/10#issuecomment-13551061.

OpenDataAlex commented 11 years ago

Hi Marvin,

I've written most of the changes needed, but the Datacleaner plugin has not been updated with the changes Kasper made. I'll be checking in with Pentaho to see where they are at in the process (I saw they updated my ticket a couple of days ago).

marvinlb commented 11 years ago

Alex- thank you for the update! Marvin

On Mon, Mar 25, 2013 at 4:22 PM, Alex Meadows notifications@github.comwrote:

Hi Marvin,

I've written most of the changes needed, but the Datacleaner plugin has not been updated with the changes Kasper made. I'll be checking in with Pentaho to see where they are at in the process (I saw they updated my ticket a couple of days ago).

— Reply to this email directly or view it on GitHubhttps://github.com/dbaAlex/PDI-DC-Auto-Profiling/issues/10#issuecomment-15422599 .

marvinlb commented 11 years ago

Hi, Alex - touching base - any word from Pentaho on theplugin? thank you- Marvin

On Mon, Mar 25, 2013 at 5:23 PM, Marvin Bleiberg mbleiberg@emelby.comwrote:

Alex- thank you for the update! Marvin

On Mon, Mar 25, 2013 at 4:22 PM, Alex Meadows notifications@github.comwrote:

Hi Marvin,

I've written most of the changes needed, but the Datacleaner plugin has not been updated with the changes Kasper made. I'll be checking in with Pentaho to see where they are at in the process (I saw they updated my ticket a couple of days ago).

— Reply to this email directly or view it on GitHubhttps://github.com/dbaAlex/PDI-DC-Auto-Profiling/issues/10#issuecomment-15422599 .