JuliaText / TextAnalysis.jl

Julia package for text analysis
Other
373 stars 95 forks source link

BM25, Co-occurrence Matrix, faster ROUGE, Fixing LSA. #165

Closed Ayushk4 closed 4 years ago

Ayushk4 commented 5 years ago

I am porting various implementations from StringAnalysis.jl and fixing various others.

As per the discussions in #164 , I am preferring to port COOM from StringAnalysis.jl for various advantages discussed.

There seem to be performance bottlenecks in rouge.jl due to Abstract containers, this also needs to be worked upon.

Ayushk4 commented 5 years ago

I have ported BM25 and Co-Occurrence Matrix from StringAnalysis.jl. Co-Occurrence Matrix works 10-15x faster than one in #164, uses less space, supports operations over Document and Corpus types.

LSA has been fixed. ROUGE - N has been re-implemented, supports languages, 15 - 20% improvement in speed and memory.

Tests, docstrings, online documentation added for all these.

@aviks, please review.

aviks commented 4 years ago

I've fixed merge conflicts, and added explicit license. attribution to zgornel in the coom.jl