Just to keep track of this issue introduced in https://github.com/coursera/dataduct/pull/227/files
If you set the split property for an extract-rds step to be not the default value of 1, it will split improperly for rows with columns that have strings with newlines.
This is because we are using the split unix command, which cannot handle escaped newlines. I think it might be possible to fix this by transforming escaped newlines to a token character and then transforming it back after.
Just to keep track of this issue introduced in https://github.com/coursera/dataduct/pull/227/files If you set the
split
property for anextract-rds
step to be not the default value of 1, it will split improperly for rows with columns that have strings with newlines.This is because we are using the
split
unix command, which cannot handle escaped newlines. I think it might be possible to fix this by transforming escaped newlines to a token character and then transforming it back after.