Closed rcrath closed 7 years ago
Need to conform the database values to a standard
Still in progress
Last week we talked about clustering user interests and making tokens by splitting at the comma. How should we handle interests that have been entered as full sentences and may have commas within them? One possibility is to delimit with ";"
What do you think @rcrath?
TL;DR: Keep it comma delimited, massage current data before importing, use clear instructions and dropdown suggestions to guide users to short tags.
I think we should keep the comma. Wordpress and soundcloud both have pretty good models for doing this. If people know that a comma creates a new tag, they will be pretty good at keeping the tags short. We will need to manually massage the data that is already in there. I did that and and put CSV in in memberconnect/docs for working on the clustering. I did not attach it to users though, so will need to do it again before we import data. It took about fifteen minutes and I don't mind doing it again. When entering new data, I lean toward dealing with it by a clear instructional label that commas separate tags. People are pretty used to that I think. As long as they can delete and re-enter, if they see odd results from a sentence with a comma in it, they can delete and redo. My hope is that between the instructions and the drop-down autocomplete that this will mostly be the odd exception rather than the rule. If we do semicolons, someone will then write a sentence with a semicolon in it or wnat to make TL;DR a tag, so I don't see that as a solution. The "proper way" SQL way would be to enclose tags in quotes to escape the commas, with commas still as the delimiters, but that is too much to expect from the user I think.
I'm going to close this as the solution to the original problem is to massage the data properly so it imports well.