NYCComptroller / Checkbook

Source codes, data, and instructions for Checkbook
https://checkbooknyc.com/
Other
49 stars 20 forks source link

Steps to ETL #6

Open chbrown opened 11 years ago

chbrown commented 11 years ago

I haven't been able to get the full package installed & deployed locally, but I tripped up on some of the data files in source/database/ETL/CREATE_NEW_DATABASE.

  1. What is the role of these files?
  2. How does the fresh data (nightly dump) get into the application?
    • How close can I get to having checkbooknyc.com running on my local machine, data-wise?
  3. ScriptsForReferenceTables.sql (for example) has absolute paths.
kfogel commented 11 years ago

Ram, assigning over to you because you know more about this than I do, but please feel free to bounce the ticket back to me if you've got other stuff on your plate.

treddy commented 11 years ago

1.What is the role of these files?

These files are to create a new Greenplum database with the necessary reference data for the application.

2.How does the fresh data (nightly dump) get into the application? ?How close can I get to having checkbooknyc.com running on my local machine, data-wise?

Try with the test data which is in source/database/ETL/SOURCE_DATA folder. Please follow the document /documentatioin/Creating new Database and running ETL Job.docx to know how to process the data daily using the ETL (Kettle) Job.

3.ScriptsForReferenceTables.sql (for example) has absolute paths.

Yes, it has some absolute paths. Please use the document /documentatioin/Creating new Database and running ETL Job.docx to see when and how you need to modify the absolute paths.