Closed tom-grivaud closed 3 years ago
Thank you @tom-grivaud for bringing this issue to our attention.
It seems like you noticed this behaviour while writing directly in the console. Can you reproduce this behaviour by writing in a local file? (write_local
command)
If yes, do you have the same problem writing the output file in a bucket?
I would like to be sure it is a general problem and not only for the console writer.
Thank you @tom-grivaud !
When you say "a field contained an outrageous number of IDs", do you mean that a single line of your .csv (supposed to contain a single record) actually featured multiple records ?
I have to admit that I am not very knowledgeable about the DV360 reader. @bibimorlet, are you using it for Samsung (if I remember well, we were rather using the DBM reader) ? If yes, did you notice any issue on the output data?
@benoitgoujon we tried with both write_console and write_s3 but none of these worked, the same error occured. @gabrielleberanger Not exactly, the file is containing various jsons having key-value pairs and one of the value from a json has this huge number of id as a single string.
Let me know if something is not clear for you guys.
ERROR AND WHY : While collecting data from the platform dv360 I encountered this issue :
From the error message I was able to set a csv.field_size_limit above the 131072 default limit.
HOW TO FIX IT AND FURTHER INVESTIGATIONS : By adding this line of code to the file_reader.py the error vanished and I was able to get my result prompted on the console.
The line is the following and was added to the nck/utils/file_reader.py file(replace 1000000 by another limit to discuss) :
Even if it worked I noticed that a field was containing an outrageous number of ids. I think that before setting this new csv.field_size_limit it could be interesting to check if there is no mistake in the process that would cause a field to contain way more ids than what it really should have.