Duplicate text profiles from ddprofiler

The text profiles produced by the ddprofiler contain duplicate column profiles, making the dindex_builder take an extra long time and disk space to create the full-text search index.

Reproduce this issue:

Download chicago open data. https://uchicago.box.com/s/ecmb69h874qwedj19ebncvu0qvd4n97h
Follow the quick start guide to index the data
Check the output_profiles_json/text

For example, in 0.csv, you can find the month_name in x2vd-qke7.csv is indexed twice.

"1507119095","demo","/Users/yuegong/Desktop/chicago_open_data_all_tbls/","x2vd-qke7.csv","month_name","JUNE MAY OCTOBER AUGUST JULY SEPTEMBER NOVEMBER APRIL" "1507119095","demo","/Users/yuegong/Desktop/chicago_open_data_all_tbls/","x2vd-qke7.csv","month_name","JUNE MAY OCTOBER AUGUST JULY SEPTEMBER NOVEMBER APRIL"

Since dindex_builder reads the text profile to build the full-text-search index, duplicates here will lead to extra indexing time and space.

TheDataStation / ver

Duplicate text profiles from ddprofiler #76