embulk / embulk-input-jdbc

MySQL, PostgreSQL, Redshift and generic JDBC input plugins for Embulk
Other
102 stars 73 forks source link

is there any solution to get last record from stdout? #151

Open makkaba opened 5 years ago

makkaba commented 5 years ago

hi i am jeff

i am now using embulk with little weird way.

  1. a batch python script generates some dynamic yaml files every some minutes.
  2. reads diff.yaml file which is generated by last batch job.
  3. applies last record into dynamic yaml file of 1.
  4. executes embulk several times with dynamic yaml files through python bash interface.
  5. drops diff.yaml file. within it, there is last record.

is there any better way to get last record? or to call embulk in code way? not in bash way?

thanks

hiroyuki-sato commented 5 years ago

Hello, @makkaba

I think that I don't understand your use case completely yet. If you execute Embulk from Python, what do you think to use a Liquid template engine? You just set an environment variable from Python.

Another idea.

embulk_test=# select * from incremental_test;
 id | name
----+------
  1 | var1
  2 | var2
  3 | var3
(3 rows)
in:
  type: postgresql
  host: localhost
  port: 5432
  user: user
  password: ****
  database: embulk_test
  table: incremental_test
  incremental: true
  incremental_columns:
  - id
out:
  type: stdout

embulk run test.yml -c diff.yml generate the following file

It output stdout like the following and create diff.yml file.

1,var1
2,var2
3,var3

diff.yml

in:
  last_record: [3]
out: {}

The number 3 is the last record. It is a YAML file. so I think you just create it with Python.

makkaba commented 5 years ago

thank you for your reply. i want to make sure my purpose:

last record = 300000 for 4 input => db output => db [1,2,3,4]

i have already used diff.yaml file and dynamic yaml file. like you said. (python template way)

every time yaml file is generated like this.

in:
  type: sqlserver
  driver_path: ~
  host: ~
  user: ~
  password: ~
  query: "SELECT ~ FROM WHERE ~ AND [idx] > :idx"
  use_raw_query_with_incremental: true
  incremental_columns:
  - idx
  incremental: true

  last_record:
  - 111111111
out:
  type: mysql
  host: ~
  user: ~
  password: ~
  database: ~
  table: ~
  mode: merge
  options: {useUnicode: true, characterEncoding: UTF-8}

but now, i want to get returned value by stdout or programatic return for some reason. (but output must be mysql.)

when i deal output with mysql, stdout will be kind of this..

** INFORMATION ** Join us! Embulk-announce mailing list is up for IMPORTANT announcement such as compatibility-breaking changes and key feature updates. https://groups.google.com/forum/#!forum/embulk-announce


java.lang.RuntimeException: java.nio.file.NoSuchFileException: sample.yaml at org.embulk.EmbulkRunner.run(EmbulkRunner.java:152) at org.embulk.cli.EmbulkRun.runSubcommand(EmbulkRun.java:437) ... ... 3 more

i just want to share use case. thanks!

hiroyuki-sato commented 5 years ago

Hello, @makkaba

Does this mean that you want to get a value from out.type: MySQL? I think It is outside scope of Embulk. It is better to use a workflow engine like digdag.

Best regards

sakama commented 5 years ago

i want to get returned value by stdout or programatic return for some reason.

Embulk provides EmbulkEmbed that allows us to execute Embulk from Java program. https://github.com/embulk/embulk/blob/master/embulk-core/src/main/java/org/embulk/EmbulkEmbed.java We (Arm Treasure Data) are using this mechanism in our platform to execute Embulk from other codes.

Unfortunately, this mechanism is intended to be executed from Java, not Python.

makkaba commented 5 years ago

EmbulkEmbed would be helpful. thank you all !