Closed OmkarPathak closed 4 years ago
Hello, @OmkarPathak
I'm not sure temporary CSV
means. It seems that input data is an Oracle database.
Anyway, Have you ever tried the embulk-filter-ruby_proc plugin?
If an input data contains CR
, you can remove/replace it.
This is an example.
filters:
- type: ruby_proc
columns:
- name: account
proc: |
-> (record){ record.nil? ? nil : record.gsub(",","") }
Change CR
-> LF
filters:
- type: ruby_proc
columns:
- name: account
proc: |
-> (record){ record.nil? ? nil : record.gsub("\r","\n") }
Trim CR
filters:
- type: ruby_proc
columns:
- name: account
proc: |
-> (record){ record.nil? ? nil : record.gsub("\r","") }
Another idea is to use embulk-filter-to_json
for example.
Could you read this document?
https://github.com/embulk/embulk-output-bigquery#formatter-performance-issue
I don't think embulk-input-oracle creates temporary CSV file.
Additionally, embulk-input-oracle is a Input plugin, isn't a File input plugin. You can't use formatter:
option with embulk-input-oracle. Formatter plugin only works with File input plugin.
The component that creates a CSV file is embulk-output-bigquery. https://github.com/embulk/embulk-output-bigquery/blob/a4b25bd4b814d9f7f5efb3df0af8fc19a48f0a8f/lib/embulk/output/bigquery/file_writer.rb#L29-L34
Why don't you use source_format: NEWLINE_DELIMITED_JSON
? So that file will be formatted with JSON.
https://github.com/embulk/embulk-output-bigquery/#same-options-of-bq-command-line-tools-or-bigquery-jobs-property
Although the JSON file also may contain a newline, the process itself would succeed.
@sakama @hiroyuki-sato thanks for such quick responses. I tried the solution provided by @sakama and it worked! Thanks a lot 😄
We are trying to load data from Oracle to Bigquery. However, temporary CSV files that are generated contain control M characters which results in newline. Bugquery hence cannot process this csv file. Are there any filters to get rid of such cases? Any help would be appreciated. Note: Specified tables are around 15+ GBs in size. Sample config.yml: