Closed jyoti-datadata closed 4 years ago
Thanks @gauravsinghh for reporting this. From your description, it sounds as though the tool isn't picking up the fact that some edges have more properties than others, resulting in too few headers. Is that correct?
I'll look to reproduce this over the next few days and put in a fix.
ian
Hi @iansrobinson , You are correct .
Also I am using release 1.0 . I did build the latest code and got the same issue.
I've not been able to reproduce this issue, despite creating a dataset containing over 1 million edges, half of which have only a subset of properties.
For each edge (or vertex) with a particular label, the tool only outputs the property values that it knows about via its metadata collection. The fact that the additional properties are being output in individual rows, even though the headers are missing, suggests that the tool has generated the metadata for these properties.
I have been able to reproduce a situation in which 3 property headers are written to the second line of the CSV file, and lots of extra newline characters inserted throughout the output, by generating a dataset containing a newline in both a property key and property value.
Given that you've seen lots of additional newlines, I wonder whether there are any additional newline characters in any of your dataset's edge keys or values – in particular the 'updatedBy' property? When the tool runs, it should generate a config.json file containing the metadata it has inferred for all labels (the location of this file is detailed in the output on the command line). Please would share this config file – or at least review it to see a) whether those 3 properties are there, and b) whether any of them contains a newline character.
Thanks
ian
On Sun, 28 Jul 2019 at 00:50, CAPITAL ONE SERVICES LLC < notifications@github.com> wrote:
Hi @iansrobinson https://github.com/iansrobinson , You are correct .
This is my official Id. I posted from my personal id @gauravsinghh https://github.com/gauravsinghh.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/awslabs/amazon-neptune-tools/issues/31?email_source=notifications&email_token=AACKOMFH7IIDANOFYZ5AMZ3QBTNLBA5CNFSM4IG7HOI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD26URIY#issuecomment-515721379, or mute the thread https://github.com/notifications/unsubscribe-auth/AACKOMAWTCASRIDBHFAT7ZTQBTNLBANCNFSM4IG7HOIQ .
Closing as unable to reproduce.
I am trying to replicate the database from one AWS region to other and using this utility to export the data from master DB.
The utility runs fine but when I try to upload the files to other Neptune DB, using Neptune bulk loader, the edge inserts fail with errors:
"errorCode" : "PARSING_ERROR", "errorMessage" : "Record has more columns than header", "fileName" : "s3://{edge file location}.csv", "recordNum" : 60
Steps to replicate issue:
Export the data using
bin/neptune-export.sh export-pg --log-level error -e {endpoint} -d ~/Downloads/
Run bulk loader command for other DB:
Check the status of the load :
Fix tried:
Our Sample Edge data looks like: headers: ~id,~label,~from,~to,createdBy:string,createdTimestamp:date,weight:double,updatedBy:string,endDate:date,updatedTimestamp:date
Data:
Export utility never created 3 header in first column: updatedBy:string,endDate:date,updatedTimestamp:date headers, I added later to fix the issue . Not all rows of data will have these values.
Size of data: there are 80323 edges in the data for this label.