MusicConnectionMachine / UnstructuredData

In this project we will be scanning unstructured online resources such as the common crawl data set
GNU General Public License v3.0
3 stars 1 forks source link

Improved performance #214

Closed felixschorer closed 7 years ago

felixschorer commented 7 years ago

Made language detection on of the first filters which results roughly in a 30% performance increase when filtering for English pages only.