Closed dmitryallen closed 4 years ago
Hi Dmitry
Thank you for taking the time to provide that much information. I will have a look in a few days but as I’ve quickly scanned your email, I can see that the csv file has a first column which is not in the PZMap.
You can parse a csv file without any PZMap, the header would be the column name.
Can you add it and see if that works? Let me know.
Alternatively use the CsvParserFactory.newXXX and use record.getString(“you column name”);
Let me know
Benoît
Important Notice This communication contains information that is considered confidential and may also be privileged. It is for the exclusive use of the intended recipient(s). If you are not the intended recipient(s) please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited and may be unlawful. If you have received this communication in error please return it to the sender and delete the original
On 18 Feb 2020, at 04:11, dmitryallen notifications@github.com wrote:
Dmitry
Hi
Has my suggestion fixed your issue? I will try to look at the code this weekend.
Benoit
On Tue, 18 Feb 2020 at 10:22, bx@appendium.com wrote:
Hi Dmitry
Thank you for taking the time to provide that much information. I will have a look in a few days but as I’ve quickly scanned your email, I can see that the csv file has a first column which is not in the PZMap.
You can parse a csv file without any PZMap, the header would be the column name.
Can you add it and see if that works? Let me know.
Alternatively use the CsvParserFactory.newXXX and use record.getString(“you column name”);
Let me know
Benoît
Important Notice This communication contains information that is considered confidential and may also be privileged. It is for the exclusive use of the intended recipient(s). If you are not the intended recipient(s) please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited and may be unlawful. If you have received this communication in error please return it to the sender and delete the original
On 18 Feb 2020, at 04:11, dmitryallen notifications@github.com wrote:
Dmitry
Thanks Benoit, unfortunately I had not time to try your solution, I have switched to SuperCSV. Please lose this issue.
Your library catched my attention because of Mapping and I was planing to use it for data ingestion in large database. The headers in my case can vary except small amount of columns which can have different positions in files.
Best regards, Dmitry
On February 25, 2020 at 5:01:09 PM, Benoit Xhenseval ( notifications@github.com) wrote:
Hi
Has my suggestion fixed your issue? I will try to look at the code this weekend.
Benoit
On Tue, 18 Feb 2020 at 10:22, bx@appendium.com wrote:
Hi Dmitry
Thank you for taking the time to provide that much information. I will have a look in a few days but as I’ve quickly scanned your email, I can see that the csv file has a first column which is not in the PZMap.
You can parse a csv file without any PZMap, the header would be the column name.
Can you add it and see if that works? Let me know.
Alternatively use the CsvParserFactory.newXXX and use record.getString(“you column name”);
Let me know
Benoît
Important Notice This communication contains information that is considered confidential and may also be privileged. It is for the exclusive use of the intended recipient(s). If you are not the intended recipient(s) please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited and may be unlawful. If you have received this communication in error please return it to the sender and delete the original
On 18 Feb 2020, at 04:11, dmitryallen notifications@github.com wrote:
Dmitry
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Appendium/flatpack/issues/53?email_source=notifications&email_token=AJROSLLPY2YBIIG4MM3UGC3REWPLJA5CNFSM4KW4WFXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM54LSI#issuecomment-591119817, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJROSLKM5IRWGFO33K27XRTREWPLJANCNFSM4KW4WFXA .
Thanks for getting back to me Dmitry.
Interesting, one of the powerful features of Flatpack is that the column order is not important and you do not need to know the columns in a 'static' way (e.g. an XML). It does discover the columns, it can even ensure that they are unique if you have 2 "name" columns for instance. the column names can be case insensitive too so: "name" or "NaMe" would be handled in the code with dataSet.getString("name"); also, it handles multi-line CSV which is quite rare.
Anyhow, thanks for the test case, I will improve Flatpack.
I regularly process multi-GB files as stream() of Record via Flatpack, works very well.
Kind regards
Benoit
On Tue, 25 Feb 2020 at 23:26, dmitryallen notifications@github.com wrote:
Thanks Benoit, unfortunately I had not time to try your solution, I have switched to SuperCSV. Please lose this issue.
Your library catched my attention because of Mapping and I was planing to use it for data ingestion in large database. The headers in my case can vary except small amount of columns which can have different positions in files.
Best regards, Dmitry
On February 25, 2020 at 5:01:09 PM, Benoit Xhenseval ( notifications@github.com) wrote:
Hi
Has my suggestion fixed your issue? I will try to look at the code this weekend.
Benoit
On Tue, 18 Feb 2020 at 10:22, bx@appendium.com wrote:
Hi Dmitry
Thank you for taking the time to provide that much information. I will have a look in a few days but as I’ve quickly scanned your email, I can see that the csv file has a first column which is not in the PZMap.
You can parse a csv file without any PZMap, the header would be the column name.
Can you add it and see if that works? Let me know.
Alternatively use the CsvParserFactory.newXXX and use record.getString(“you column name”);
Let me know
Benoît
Important Notice This communication contains information that is considered confidential and may also be privileged. It is for the exclusive use of the intended recipient(s). If you are not the intended recipient(s) please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited and may be unlawful. If you have received this communication in error please return it to the sender and delete the original
On 18 Feb 2020, at 04:11, dmitryallen notifications@github.com wrote:
Dmitry
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/Appendium/flatpack/issues/53?email_source=notifications&email_token=AJROSLLPY2YBIIG4MM3UGC3REWPLJA5CNFSM4KW4WFXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM54LSI#issuecomment-591119817
, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AJROSLKM5IRWGFO33K27XRTREWPLJANCNFSM4KW4WFXA
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Appendium/flatpack/issues/53?email_source=notifications&email_token=AAB542NIOZBOWA6EVG4SCPDREWSJZA5CNFSM4KW4WFXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM57UAA#issuecomment-591133184, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB542MJV2TWTKHUKI7JVYDREWSJZANCNFSM4KW4WFXA .
Thanks Benoit, I hope my bug report will provide you an info for product improvements.
This shift of column data looks like a bug or maybe I missed something in configuration
Best regards, Dmitry
On February 25, 2020 at 5:31:37 PM, Benoit Xhenseval ( notifications@github.com) wrote:
Thanks for getting back to me Dmitry.
Interesting, one of the powerful features of Flatpack is that the column order is not important and you do not need to know the columns in a 'static' way (e.g. an XML). It does discover the columns, it can even ensure that they are unique if you have 2 "name" columns for instance. the column names can be case insensitive too so: "name" or "NaMe" would be handled in the code with dataSet.getString("name"); also, it handles multi-line CSV which is quite rare.
Anyhow, thanks for the test case, I will improve Flatpack.
I regularly process multi-GB files as stream() of Record via Flatpack, works very well.
Kind regards
Benoit
On Tue, 25 Feb 2020 at 23:26, dmitryallen notifications@github.com wrote:
Thanks Benoit, unfortunately I had not time to try your solution, I have switched to SuperCSV. Please lose this issue.
Your library catched my attention because of Mapping and I was planing to use it for data ingestion in large database. The headers in my case can vary except small amount of columns which can have different positions in files.
Best regards, Dmitry
On February 25, 2020 at 5:01:09 PM, Benoit Xhenseval ( notifications@github.com) wrote:
Hi
Has my suggestion fixed your issue? I will try to look at the code this weekend.
Benoit
On Tue, 18 Feb 2020 at 10:22, bx@appendium.com wrote:
Hi Dmitry
Thank you for taking the time to provide that much information. I will have a look in a few days but as I’ve quickly scanned your email, I can see that the csv file has a first column which is not in the PZMap.
You can parse a csv file without any PZMap, the header would be the column name.
Can you add it and see if that works? Let me know.
Alternatively use the CsvParserFactory.newXXX and use record.getString(“you column name”);
Let me know
Benoît
Important Notice This communication contains information that is considered confidential and may also be privileged. It is for the exclusive use of the intended recipient(s). If you are not the intended recipient(s) please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited and may be unlawful. If you have received this communication in error please return it to the sender and delete the original
On 18 Feb 2020, at 04:11, dmitryallen notifications@github.com wrote:
Dmitry
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <
, or unsubscribe <
https://github.com/notifications/unsubscribe-auth/AJROSLKM5IRWGFO33K27XRTREWPLJANCNFSM4KW4WFXA
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/Appendium/flatpack/issues/53?email_source=notifications&email_token=AAB542NIOZBOWA6EVG4SCPDREWSJZA5CNFSM4KW4WFXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM57UAA#issuecomment-591133184 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAB542MJV2TWTKHUKI7JVYDREWSJZANCNFSM4KW4WFXA
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Appendium/flatpack/issues/53?email_source=notifications&email_token=AJROSLLYRURO65WEO4WYKLDREWS5RA5CNFSM4KW4WFXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM6AITA#issuecomment-591135820, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJROSLKA3NDXVHTDQA5FTG3REWS5RANCNFSM4KW4WFXA .
Hi Dmitry
Just an update for the record. I think that the issue is due to a misunderstanding of the interface. Allow me to explain.
If you specify the PZMap then you are actually specifying the columns in sequential order, Flatpack will NOT use the headers that are in the file. So, in your example, you have said that the first column is Program even if the data in the file is 'RefDate'.
One could argue that the factory method should not allow you to specify a PZMap AND whether to skip the first row or not but there might be cases where the column header will always be in the file but the headers NAMES might change but not the order, you would then use the programme as you have defined BUT you must specify every columns or at least the sequence of columns up to the last one you are interested in.
I trust that the explanation makes sense.
No bug here but I will add a test case with your data and example.
Thank you
Benoit
Hello, trying to parse large file. Using mapping XML. When file is parsed I'm trying to get values by column name, but it returns wrong values, for example for column "Program" it's returning a date, value from previous column "RefDate". "PRIMARY_PHONE_NUM" return value of MEMBERID
Thanks Dmitry
POM:
Code:
Mapping:
CSV: