Open kfogel opened 11 years ago
Now we have the database scripts and ETL Scripts that will work with the open source postgreSQL database version 9.2x or later. please find the document Creating new Database and running ETL Job_PostgreSQL_Database.doc in documentation folder of checkbook_nyc branch.
This is exciting news -- I will take a look ASAP! Thank you.
Best, -Karl
Just curious -- why is this in checkbook_nyc
branch instead of either checkbook_edc
or master
? https://github.com/NYCComptroller/Checkbook/wiki/Branch-Management indicates that one of the latter two would be the place to put it.
It's not a big deal -- obviously, those of us who want to look at it can find it on whatever branch it's on. But it's a bit easier if new things happen on the designated branches, so they are noticeable to people watching those branches.
As of now there is no test data generated for EDC. I will generate the test data after the UAT is done. And then I will check-in the database scripts and ETL job for EDC database in checkbook_edc branch.
Ah, I understand now, thanks. Yes, it seems reasonable that the documentation lives where the test data lives. I'll take a look at it on checkbook_nyc
then. Best of luck with UAT.
The "TL" part of the ETL process for loading data into Checkbook currently requires certain features of the Greenplum database. While Greenplum is proprietary, there is a free-for-noncommercial-use "community edition" available as of this writing, so this situation is not a showstopper. Removing this proprietary dependency would be a useful enhancement, however, to enable Checkbook to run entirely on open source software.
At one point I asked some Checkbook developers about this:
Tirupati Reddy answered that the three features of Greenplum DB used in ETL Scripts that are not there in PostgreSQL DB are:
Tirupati said that changing the scripts to remove the need for (2) and (3) is probably not too hard, but (1) would take some time.