dbpedia-spotlight / wikipedia-stats-extractor

Raw Wikipedia counts for entity linking
19 stars 5 forks source link

Entity(Uri) Counts Logic #1

Closed nmadhire closed 9 years ago

nmadhire commented 9 years ago

Pull Request for reviewing Entity counts using Apache Spark and Scala.

nmadhire commented 9 years ago

Organized the code little bit to make it more general. Let me know if you see something more can be improved.

dav009 commented 9 years ago

It looks much better after your changes. Im willing to let this PR merged.

It would be good as part of the next batch of work to abstract how things are parsed from ComputeStats.

In this case countURI has to know about link.ids which is inherently related to the way jsonpedia is structured.

It would be good if the parser exposes certain interfaces, such that all of this internals are not required to be known by ComputeStats.

for example counting uris could look like similar to:

def countURI(){
    parser.getURIS().map(..function with count magic..)
}
nmadhire commented 9 years ago

This is good to go now. I will change the ComputeStats logic in the next PR tomorrow.