Closed jjacobson95 closed 4 months ago
ok, can you just look to see if there are any sample identifiers with commas in them?
Ahh I see, there are cancer_types, other_names, and common_names with commas. I'll close this and just build it into the package to check for this.
Sorry, re-opening. The column values are fine, but I think we should write the headers without quotes. I think any inter-dataset operations such as merge, concat, etc, won't work if the headers are different.
polars has a quote_char
argument that should handle this.
i'd suggest using this instead of changing the underlying schema, as quotes will be scattered throughout.
Here is an attached screenshot comparing broad_sanger to cptac samples file. Quotes should be removed from all strings.