WikiEducationFoundation / impact-visualizer

1 stars 1 forks source link

Dashboard tool datasets can have improperly formatted data #27

Open ragesoss opened 6 days ago

ragesoss commented 6 days ago

I used the tool to get the users and articles for a large campaign: https://dashboard.wikiedu.org/campaigns/communicating_science/overview

When trying to process them, both the Users and Articles CSVs resulted in CSV::MalformedCSVError during processing.

For the articles CSV, it is from this title: Nathan_"Nearest"_Green.

For the users CSV it is from this user: https://en.wikipedia.org/wiki/User:%22robin_ramlall%22

ragesoss commented 6 days ago

@Aminehassou this one's for you

ragesoss commented 6 days ago

@mattfordham how should we format a CSV that has articles or users that include quotation marks as part of the name/title?

mattfordham commented 6 days ago

I haven't tested this, but wrapping the title in quotes and "double quoting" the interior marks is probably the answer. I'll test and confirm ASAP (today or tomorrow).

mattfordham commented 6 days ago

@ragesoss @Aminehassou Confirmed that double quoting is the answer. So, for the cases above, they'd need to be formatted in the CSV as follows:

"Nathan_""Nearest""_Green" """robin ramlall"""