boun-tabi-LMG / turkish-academic-text-harvest

MIT License
4 stars 0 forks source link

Improve filtering steps #4

Closed gokceuludogan closed 1 year ago

gokceuludogan commented 1 year ago

The order of the filtering steps in the script needs to be revisited to ensure optimal performance and accuracy. We should evaluate the current order and consider any necessary adjustments.

We also need to measure the performance of various functions and filters used in the script to identify any potential bottlenecks.

zeynepyirmibes commented 1 year ago

Langdetect has been replaced with langid, mark_items has been modified to work only with small documents. Filtering steps have been optimized.