dataintoresults / data-brewery

Data Brewery is an ETL (Extract-Transform-Load) program that connect to many data sources (cloud services, databases, ...) and manage data warehouse workflow.
https://databrewery.co
Apache License 2.0
16 stars 0 forks source link

exception while fetching data from GA to postgres #2

Open viki-crypto opened 3 years ago

viki-crypto commented 3 years ago

I have configured to fetch data from google analytics. and seems correct as when I run ipa read ga.sessions, I get sample data fetched successfully on terminal.

however when I am running the process which is to save the data in a postgres db, I get following exception. not sure what to make of it.

Exception in thread "main" java.lang.RuntimeException: Column date unknown type : varchar
    at com.dataintoresults.etl.impl.ColumnBasic.basicType(ColumnBasic.scala:67)
    at com.dataintoresults.etl.datastore.sql.PostgreSqlStore.$anonfun$createDataSink$2(PostgreSqlStore.scala:146)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
    at scala.collection.Iterator.foreach(Iterator.scala:941)
    at scala.collection.Iterator.foreach$(Iterator.scala:941)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    at com.dataintoresults.etl.core.EtlChilds.foreach(EtlParameter.scala:228)
    at scala.collection.TraversableLike.map(TraversableLike.scala:238)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
    at com.dataintoresults.etl.core.EtlChilds.map(EtlParameter.scala:228)
    at com.dataintoresults.etl.datastore.sql.PostgreSqlStore.createDataSink(PostgreSqlStore.scala:146)
    at com.dataintoresults.etl.impl.source.SourceDataStore.processOnModule(EtlSource.scala:147)
    at com.dataintoresults.etl.impl.EtlImpl.$anonfun$runModule$3(EtlImpl.scala:377)
    at com.dataintoresults.etl.impl.EtlImpl.$anonfun$runModule$3$adapted(EtlImpl.scala:370)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at com.dataintoresults.etl.impl.EtlImpl.runModule(EtlImpl.scala:370)
    at com.dataintoresults.ipa.Ipa$.$anonfun$runModule$2(Ipa.scala:300)
    at com.dataintoresults.ipa.Ipa$.$anonfun$runModule$2$adapted(Ipa.scala:297)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at com.dataintoresults.ipa.Ipa$.$anonfun$runModule$1(Ipa.scala:297)
    at com.dataintoresults.ipa.Ipa$.$anonfun$runModule$1$adapted(Ipa.scala:287)
    at com.dataintoresults.ipa.Ipa$.$anonfun$process$2(Ipa.scala:159)
    at com.dataintoresults.ipa.Ipa$.$anonfun$process$2$adapted(Ipa.scala:156)
    at com.dataintoresults.util.Using$.using(Using.scala:24)
    at com.dataintoresults.ipa.Ipa$.process(Ipa.scala:156)
    at com.dataintoresults.ipa.Ipa$.runModule(Ipa.scala:287)
    at com.dataintoresults.ipa.Ipa$.main(Ipa.scala:419)
    at com.dataintoresults.ipa.Ipa.main(Ipa.scala)

the store entry for the datastore looks like below:

<datastore name="ga" type="googleAnalytics" viewId="XXXXX">
    <!-- It will import all data from 30 days ago to yesterday -->
    <table name="sessions" startDate="30daysAgo" endDate="yesterday">
        <!-- We start by defining the dimensions (type = text) -->
        <column name="date" type="varchar" gaName="ga:date" gaType="dimension" />
        <column name="deviceCategory" type="varchar" gaName="ga:deviceCategory" gaType="dimension" />
             <!-- Then measures (type = bigint) -->
        <column name="sessions" type="bigint" gaName="ga:sessions" gaType="measure" />
    </table>

any help would be appreciated.

SebVentures commented 3 years ago

Can your try with text instead of varchar?

viki-crypto commented 3 years ago

Thats interesting. varchar worked when I changed the warehouse db to mysql, but was not working with postgres. I shifted back to postgres and changing to text works now.

I am facing another issue now though. if I use mysql as data warehouse store, the tables are getting created and data is getting stored. but if I change to postgres as warehouse db, there are no tables created, the process just runs till my custom sql queries cause an exception Exception in thread "main" org.postgresql.util.PSQLException: ERROR: relation "ga_session" does not exist. also listing the tables/relations with psql say no relations found. In mysql I could see multiple tables including some staging suffixed tables were created. Any thoughts.

SebVentures commented 3 years ago

I have plenty of process using postgresql and it works fine.

Have you more information when your run with the very verbose flag? ipa run-module -V <module_name>

viki-crypto commented 3 years ago

here is the verbose output. still I do not see any tables created in postgres. I am simply running \c analytics and \dt to check the relations.

ipa run-module -V analytics
[info] Adding configuration file dw.conf
[info] Loading datawarehouse file dw.xml ...
[info] Loaded 10 datastores (PostgreSql,MySql,GoogleAnalytics) and 2 modules
[debug] Sending command=run-module&uuid=8ab24247-1ce7-4248-8c5d-350325ff069d&stage=START&version=1.0.0-M2-20200224&nbds=10&nbmod=2&dstypes=PostgreSql,MySql,GoogleAnalytics to https://dataintoresults.com/baton.php
[info] Processing module analytics ...
[info] analytics : init
[info] analytics : Ensure existence of schema analytics
[info] Running module analytics: Ensure existence of schema analytics
[info] SqlStore : Openning connection to : jdbc:postgresql://localhost:5432/analytics?sslmode=prefer&sslfactory=org.postgresql.ssl.NonValidatingFactory (datastore dwh)
[info] analytics : Sourcing for ga_sessions
[info] Running module analytics: Sourcing for ga_sessions
[info] Connect to Google Analytics with service account xx.iam.gserviceaccount.com and P12 file key/yy.p12
[info] Google Analytics report : dim = ga:date,ga:deviceCategory measures = ga:sessions,ga:pageviews
[info] Fetching 21 Google analytics rows (already read: 0 )
[info] Create table with definition :  create table "analytics"."ga_sessions_staging"("date" varchar, "deviceCategory" varchar, "sessions" bigint, "pageviews" bigint)
[info] PostgreSqlStore : Start batch copy to dwh.analytics.ga_sessions_staging)
[info] PostgreSql: End batch copy to dwh.analytics.ga_sessions_staging (21 rows)
[info] Create table with select : create table "analytics"."ga_sessions" as select * from "analytics"."ga_sessions_staging"
[info] Processing module analytics took 1 second and 256 milliseconds
[info] SqlStore: Closing connection to datastore dwh
[debug] Sending command=run-module&uuid=8ab24247-1ce7-4248-8c5d-350325ff069d&stage=END&version=1.0.0-M2-20200224&nbds=10&nbmod=2&dstypes=PostgreSql,MySql,GoogleAnalytics to https://dataintoresults.com/baton.php
SebVentures commented 3 years ago

It seems to work indeed. Can you run the following SQL query using the PostgreSQL account? create table "analytics"."ga_sessions_staging"("date" varchar, "deviceCategory" varchar, "sessions" bigint, "pageviews" bigint)

And is the analytics schema created?

I would bet that the account you use doesn't have enough rights on the database. And the software sadly doesn't propagate the failure.

viki-crypto commented 3 years ago

Oh, this query returns an error -ERROR: schema "analytics" does not exist. My bad, I had only created a database named analytics and not the schema. Thanks, now I could get it to work after creating the analytics schema.

SebVentures commented 3 years ago

The system try to create the schema, but probably the user doesn't have the right to.

Anyway, the process should fail if those SQL queries don't work as expected. I will leave the issue open to fix it on the next release.

Thanks!