Closed md-arif-shaikh closed 2 years ago
Currently, there is no such function. The field separator is hardcoded to ,
in df-read/csv
and df-write/csv
, which happens here: https://github.com/alex-hhh/data-frame/blob/475dc4c8b2d70ae704f6c634dcc77b09eae65d46/private/csv.rkt#L157
Unfortunately, changing the separator to space, as you suggested, would create problems, since the CSV parser will trim white space from around cells. Perhaps writing a separate function is the right way to go about it...
In any case, writing df-read/...
functions is not special, and you can write your own by using functions provided by this package.
Hi @alex-hhh I have been fiddling with this for some time and I see that in the csv.rkt
file if I replace the hardcoded ,
by an argument to the function as #:sep (sep #\,)
then I can write to and read from comma
tab
, space
separated files correctly. I am thinking of a new function as you suggested. Would you accept a PR if I find this to work after a bit of more experiment? Also, what would be a good name for such a function? What kind of test would you suggest I put this function to?
What is the data format you are trying to read? Is this specified somewhere?
The problem I see is that, when you replace the ,
with a space as the separator, than the separator is one single space, and this means that a file like the following would be read as a table of 4 columns and 2 rows since there are several spaces between "1" and 2":
1 2
11 12
that is, the above would be equivalent to the following CSV file, even though the user probably intended to have a table of 2 columns.
1,,,2
11,,12,
Sorry, I should have been more specific, by space
I meant a single space which is equivalent to #\space
. I can try to see if I can make it work with multiple spaces. But I think extending sep
from only ,
to (#\, #\space #\tab)
might be very useful in itself.
Extending the code to accept multiple spaces would not work well either, since now empty cells could no longer be represented. The fact is that using space as a separator is problematic. Tab is the same, since many editors expand tabs or combine multiple spaces into tabs or use a mixture of both, which results in tabular data which looks OK in a text editor, but it is problematic to read.
I understand that changing the separator to a single space worked for your data set, which happened to have a single space for numbers, but it would not work in the general case...
This is why I asked if you have some specification for the data format you are trying to read?
For example, Excel handles these types of files by allowing the user to specify the tab width and the column number where each cell starts -- this approach would require a completely different parser than df-read/csv
with a space separator...
I think I get it. I usually create my own dataset, say using numerical simulation, so I have full control over the data format and therefore it works for me. But, yes, as you said for the general-purpose usage it would cause problems.
Hi,
Is there an existing function to read data from a file where the data is separated by some string
x
other than,
? For example,x=" "
.