SwissClinicalTrialOrganisation / secuTrialR

Handling of data from the clinical data management system secuTrial
https://swissclinicaltrialorganisation.github.io/secuTrialR/
Other
8 stars 12 forks source link

CSV structure can break down due to free text fields #220

Closed PatrickRWright closed 4 months ago

PatrickRWright commented 3 years ago

Describe the bug In fringe cases, free text fields can cause the csv structure to break down and data will not be read as expected.

To Reproduce I currently do not have an example that I can share publicly.

Expected behavior Read data as expected or warn appropriately.

Additional context Something like this:

The patient said:
"Sir, I really have significant health issues"; Also mentioned: "However, none of the treatments are working."

The combination of the quotes together with common field delimiters gets in the way of expected behavior.

aghaynes commented 3 years ago

can you experiment with the different functions for reading data in (read.csv, fread, etc)? this isn't secuTrialR per se, but reading the data into R in general... Off the top of my head, I dont know if any will read stuff like that (particularly if there's carriage returns in there too...)

PatrickRWright commented 3 years ago

That was my thinking too which is also why I am not too concerned/in a hurry. However, I think it would be useful to at least trigger some kind of warning (if possible). I guess free text will always be a pain because its by definition "free".

The specific export that caused this issue could be "remedied" by switching the field delimiter from "," to "\t" in the ExportSearchTool options.

aghaynes commented 3 years ago

is it easy enough to identify when the problem occurs? maybe a warning with "recommend changing delimiter" is worthwhile, I guess that's what you meant? not that that is bombproof either though...

PatrickRWright commented 3 years ago

Should we set "\t" as recommended export option? I think using tabs in freetext is a lot less common than commas or semicolons.

aghaynes commented 3 years ago

best would be if iAS can wrap strings in quotes, then it should be fine. I wouldn't make the recommended settings tooooo different to the defaults

PatrickRWright commented 3 years ago

So the below (adjusted from the example above) would work properly? This should be one cell if loaded into e.g. Excel.

"The patient said:
"Sir, I really have significant health issues"; Also mentioned: "However, none of the treatments are working.""
PatrickRWright commented 3 years ago

Note to myself: @PatrickRWright check difference in "CSV format" and "CSV format for MS Excel".

aghaynes commented 2 years ago

this might now be fixed by #249 ?