Bears-R-Us / arkouda

Arkouda (αρκούδα): Interactive Data Analytics at Supercomputing Scale :bear:
Other
249 stars 90 forks source link

CSV with non-newline Row Delimiter #2065

Open Ethan-DeBandi99 opened 1 year ago

Ethan-DeBandi99 commented 1 year ago

Chapel's native File I/O only allows reading files line by line in the case when lines/rows of data are newline (\n) delimited. This presents an issue specific to CSV reading within Arkouda. Namely, reading it is valid to have a file that is not newline delimited due to contents containing new lines. Initially, the plan was to allow the user to select their delimiter and default to newline. However, this results in a potential case where the entire file will need to be read in to access only the Meta Data information. For an initial implementation, CSV is being built assuming all files are newline delimited. This obviously presents a lot of limitations but prevents the case where an entire file will be read in multiple times.

This issue is being put in to explore the ability to allow for delimiters that are not newline to be used to delimit rows of data. I will link the Chapel issues I create in relation to this once they are created.

The Chapel issue here would also allow for the line delimiters of the header and the data to differ. Currently they must be the same.

Ethan-DeBandi99 commented 1 year ago

Noting that a work around was provided in Chapel Issue #21392.