MusicConnectionMachine / UnstructuredData

In this project we will be scanning unstructured online resources such as the common crawl data set
GNU General Public License v3.0
3 stars 1 forks source link

add a new column to the csv file #198

Closed goldbergtatyana closed 7 years ago

goldbergtatyana commented 7 years ago

Hi guys, can you please add a new column to he csv file containing the terms that are matched on the websites that you return? I saw that thhough the nice filtering many websites have words 'hotel' or 'travel' in their URL, so having the list of terms that were matched we could filter those additional websites that are not relevant. I could do the manual filtering. Just let me know

felixschorer commented 7 years ago

Sure! @goldbergtatyana

I'd say we move the discussion over to #196 though as that issue is meant to be for experimenting with different threshold values and also refining the term-blacklist.txt file.