Open theianrobertson opened 6 years ago
Checking the delimiter for every byte of data would introduce a very significant performance change to cover this case. While I agree that it would be nice to find that the source data has a character that is being used for a delimiter, but I think for the most part this is why the to_list
and to_dict
exists so that in the case that this behavior is desired they can be used to produce those results with a higher level of safety (i.e. use to_list
and pass to Python's csv
module). Since to_str
is building a buffer in C and returning that as text it would be a fairly large change to try and introduce the kind of logic that may address some of these concerns like a quote character. This was not originally in scope for this because to_str
is suppose to be fast and straight-forward, and I also wish to discourage the use of character delimited formats altogether because they are not portable or safe to use in far too many cases, vs. something like JSON which fully supports unicode, is self-describing of all fields, and is portable regardless of OS/CPU Architecture.
I think it is possible to better abstract this so that it is either more apparent how to solve issues like in your example, or to just provide a better interface that would solve for this automatically. For example, to_str
is still available but under the hood it just calls to_list
and uses something like csv
rather than providing the other option that builds the string in C. Thoughts?
If I have a table with a pipe (
|
) in a character field and try to export and write to a string, there's no checking for whether the delimiter exists in any of the fields:I could bring it out with a different delimiter, or roll my own
to_str
that takes in adict
fromto_dict
/to_list
, but wondering if there should be functionality pushed up intoto_str
to error out, warn, or remove the field if the delimiter exists in a column?