Open dave-csc opened 2 months ago
Update: the same behaviour occurs even without the use of Metadata Injection (i.e. the simplest pipeline Table Input -> Text File Output).
Update # 2: I managed to have a simpler file by selecting the option "Fast data dump", in this case the output has no trailing spaces as expected.
I suggest to revise the functionalities "Right pad fields" and "Fast data dump": I would expect to have just the first one in the UI and select it if I like to have a padded output (or keep it unselected if I don't want to). In the documentation it's not so clear what this sentence means: "Improves the performance when dumping large amounts of data to a text file by not including any formatting information", since a text file has no format information...
I suspect that you have specified a length on the text fields in the text file output, putting the length to 0 will limit the size to the exact length.
The main reason for this is that some people need fixed width fields (usually older systems)
There is a button in the bottom for "minimal width" that will change the settings accordingly
For the trim types the documentation states:
The trimming method to apply on the string. Note: Trimming only works when there is no field length given.
I suspect that you have specified a length on the text fields in the text file output, putting the length to 0 will limit the size to the exact length.
Actually not: I just created a template pipeline with the two components and injected the SQL query and the file to write (I can't set a schema in Text File Output, since I'm using this template to dump several tables in text files). When developing the same pipeline as a stand-alone one, you can use the Get Fields button and you get the schema from the database query, including the field lengths. By the way, lengths are retrieved for Integer values too, but differently from Strings they're not applied in the files with the default options (they occupy the minimum length, even with the option "Right pad fields" enabled).
There is a button in the bottom for "minimal width" that will change the settings accordingly
It works if you set a stand-alone pipeline, but this function can't be set through metadata injection
The main reason for this is that some people need fixed width fields (usually older systems) [...] For the trim types the documentation states:
The trimming method to apply on the string. Note: Trimming only works when there is no field length given.
I was referring to the "Fast data dump" option, not the "Trim type" one. And indeed for that reason I think the "Fast data dump" and "Right pad fields" should be merged and corrected: you could only implement a single flag like "Write fixed length fields". When enabled it prints a file with fixed width fields (suitable for older systems), when disabled the file is written in CSV-style...
Fast data dump is not related to fixed width, it just means it will ignore the formatting options you specify on the fields. Eg we will dump the raw float when the field is a number and we don't format it to #.##. You are a bit in an edge case where the size of the strings is coming from the database but you want to ignore those.
Apache Hop version?
2.9.0, 2.10.0 SNAPSHOT
Java version?
openjdk 21.0.2 2024-01-16
Operating system
Linux
What happened?
Steps to reproduce:
The VARCHAR2/String values are always written in the file with the trailing spaces needed to match their maximum length, regardless you choose or not the flag "Right pad fields?".
A workaround would probably be to use a String operations to trim data manually, but to use that you need to specify the table/view schema (so you can't use it with Metadata Injection)
Issue Priority
Priority: 2
Issue Component
Component: Metadata, Component: Transforms