we'd have to do this right at the start, after reading in the dataDescription rows to figure out that we have an nlp column, but before we do anything else.
we could go through the whole raw document.
for each row, ignore the number of commas up to the nlp column, and then the correct number of commas after the nlp column to the end of the row.
then concat everything else in there together. then remove all strings, quotes, newline characters, etc.
or, we could just find a proper csv parser that can handle things like unbalanced quotes with commas, etc.
it's kind of icky manual work, but:
we'd have to do this right at the start, after reading in the dataDescription rows to figure out that we have an nlp column, but before we do anything else.
we could go through the whole raw document.
for each row, ignore the number of commas up to the nlp column, and then the correct number of commas after the nlp column to the end of the row.
then concat everything else in there together. then remove all strings, quotes, newline characters, etc.
or, we could just find a proper csv parser that can handle things like unbalanced quotes with commas, etc.