NorthConcepts / DataPipeline-Examples

DataPipeline Examples
Apache License 2.0
17 stars 5 forks source link

Exporting csv file in form of zip #15

Open ishansrivastava opened 4 years ago

ishansrivastava commented 4 years ago

Hello, I have to download the data from db to CSV. And i have to downlaod it in .zip form how can we attain that.Right now i am just creating a file and transferring data to that csv file using this datapipeline. But as i said i want to show that the zip file is getting downloaded in the background.

DeleTaylor commented 4 years ago

Hi, I'll leave the Zip part for you to search on Stackoverflow. An example of the CSV export part can be found in this demo: https://github.com/NorthConcepts/DataPipeline-Demos/tree/master/Export-CSV-and-Excel-from-Java-web-apps

ishansrivastava commented 4 years ago

Hi dele , Had one doubt, does JdbcWriter() auto commits in the db?

DeleTaylor commented 4 years ago

Normally, it's however you've configured the connection passed in to it. However, setCommitBatch() controls this for batch inserts.

See https://northconcepts.com/javadocs/com/northconcepts/datapipeline/jdbc/JdbcWriter.html#setCommitBatch-boolean-

ishansrivastava commented 4 years ago

Well i did that JdbcWriter writer=new JdbcWriter(connection,table_name).setAutoCloseConnection(true).setCommitBatch(true); ((JdbcWriter) writer).setBatchSize(1000);

By the way how can we use this setBatchSize() when we are writing from db to .csv file

DeleTaylor commented 4 years ago

setBatchSize() doesn't apply when writing a CSV file or reading from a database. What would you expect it to do in your case?

ishansrivastava commented 4 years ago

So,actually i am writing data from db to csv. So as you suggested last time(csv to db) that if we incorporate setBatchSize() then the loading becomes faster.So in case of db to csv what should i use?

DeleTaylor commented 4 years ago
  1. How much data are you loading from DB to CSV?
  2. How much time is it taking?
  3. How much time do you need it to be instead?
ishansrivastava commented 4 years ago

To download 129 MB data its taking around 1.5 minutes.Requirement is like 15 -20 seconds max

DeleTaylor commented 4 years ago

The downoad time depends on many things: network speed, database indexes, pipeline logic, etc. Understanding your situation would require opening a commercial support ticket to review your code and environment.

You can try the AsyncReader to read and write in separate threads, but that requires a Small Busines license.

https://northconcepts.com/docs/examples/use-multi-threading-in-a-single-job/

ishansrivastava commented 4 years ago

Hi, So i observed something so there are some columns which can be null and can have values also so those columns has to be part of the record, but it seems we can't send null values, as it is throwing null pointer exception. So is there a way that i can send null values also(from csv to db)

ishansrivastava commented 4 years ago

Any advice sir? Mainly the question can we send null values by this API

DeleTaylor commented 4 years ago

Yes, we can send null values to the database without doing anything special. This is a core feature of the framework.

ishansrivastava commented 4 years ago

Hello sir , I am uploading data from csv to db and doing 4 jdbc lookups on each record for validation purpose.And its taking 25-30 min to upload 10000 data. I am commiting batches(1000).Just want to know that in this scenario is this the best speed which i can attain or something else can be done to make it really fast

ishansrivastava commented 4 years ago

Hello Dele, I want to know that how the jdbclookup is happening in this API will the amount of lookup time will increase if the data in the table on which the lookup is performed increases? Want a clear picture of it.

DeleTaylor commented 4 years ago

You can speed up database lookup a few ways.

  1. Ensure your database lookup query is using an index.

  2. Try caching individual lookups. https://northconcepts.com/docs/examples/cache-lookup-values/

  3. If your lookup table is small, you can try pre-cache it by changing the lookup to something like:

    Lookup lookup = new DataReaderLookup( new JdbcReader(connection, "SELECT * FROM hospital"), new FieldList("hospital_name", "address1", "city", "state", "zip_code"), new FieldList("hospital_id"));

BTW, 25 minutes for 10k records seems very slow, even with uncached lookups. Try running a profiler to see where you're losing time.

ishansrivastava commented 4 years ago

Hello Dele, I have one memory question. So lets say we have a batch size of 1000 so as soon as reader gets 1000 records it will write it to the db, so will that 1000 records after getting wriiten will be flushed out of the memory or it will stay in the memory till the whole code executes?

ishansrivastava commented 4 years ago

Any insight on the above question dele?

DeleTaylor commented 4 years ago

It depends on the JDBC driver.

DataPipeline has various strategies for inserting and upserting data to a database. The default strategies pass data directly to the JDBC driver without keeping hold of anything, even in batch mode.

ishansrivastava commented 4 years ago

Hi Dele, So,I am facing one issue the query which i have written in my jdbcreader it's correct but it's not reading anything from the DB into the reader,So the reader is being empty(AND THE DATA IS PRESENT FOR THAT QUERY).Can you please tell when this issue comes up?

DeleTaylor commented 4 years ago

This would happen if you're using an expired Small Business trial license.

Other queries and other readers would also produce no output. You would also see a license message in your console/log file.

ishansrivastava commented 4 years ago

Hello Dele, You were right it was a license issue .license file was missing in the jar file.I am facing one issue not able to figure out why,when the downloading is done the session is getting logged out do you have any idea what could be the solution of it?

DeleTaylor commented 4 years ago

Sorry, I don't. Maybe something is happening to your session cookie. Your browser's developer tools might help to track this down.

ishansrivastava commented 4 years ago

Hi dele, Just want to ask that Jdbcupsertwriter works for mssql ? because in mssql its throwing violation of primary key

DeleTaylor commented 4 years ago

Yes, MSSQL (Microsoft SQL Server) does support upsert, but only using the GenericUpsert strategy. See https://github.com/NorthConcepts/DataPipeline-Examples/blob/master/src/main/java/com/northconcepts/datapipeline/examples/cookbook/WriteToDatabaseUsingGenericUpsert.java

Which strategy are you using?

ishansrivastava commented 4 years ago

JDBCUPSERT(table_name,primary_key)

DeleTaylor commented 4 years ago

I would need to see the full exception (entire stacktrace and key-value properties at the top). Also, which versions of MSSQL and DataPipeline are you using?

ishansrivastava commented 4 years ago

Versions: 1-Microsoft SQL Server 2016 (SP2-GDR) (KB4293802) - 13.0.5081.1 (X64) Jul 20 2018 22:12:40 Copyright (c) Microsoft Corporation Developer Edition (64-bit) on Windows Server 2016 Standard 10.0 (Build 14393: ) (Hypervisor)

ishansrivastava commented 4 years ago

Error data_pipeline1

ishansrivastava commented 4 years ago

Error_data_pipeline2

DeleTaylor commented 4 years ago

The stacktrace seem to be cut off. Can you post it here or email it?

DeleTaylor commented 4 years ago

Can you try using new GenericUpsert().setInsertFirst(false) as in the following example to force updates to happen first?

https://github.com/NorthConcepts/DataPipeline-Examples/blob/master/src/main/java/com/northconcepts/datapipeline/examples/cookbook/WriteToDatabaseUsingGenericUpsert.java

ishansrivastava commented 4 years ago

ok will try that