adidier17 / AuthorRank

A modification of PageRank to find the most prestigious authors in a scientific collaboration network.
MIT License
16 stars 7 forks source link

MRG: create permutations per document not globally #23

Closed adidier17 closed 4 years ago

adidier17 commented 4 years ago

This seemed to fix the issue from the few examples that I tried and stepping through the code with the debugger, but I'm still having some results for the edge weights on the CORD dataset that lower my confidence.

vc1492a commented 4 years ago

Thanks @adidier17! I pulled the branch and checked it out. I wasn't able to run the following example files: top_n_authors.py, mls_top_n_authors.py, export_to_json.py. I'll see if I can run the cord.py example shortly and let you know what I think,

Note for later: we'll need to remember to update the unit tests once we have resolved the issue, as 5 of the 8 tests currently fail.

vc1492a commented 4 years ago

@adidier17 just checked out the CORD example, and it looks much better! I think it's certainly more representative of the true results. What are you seeing in the edge weights that lower your confidence?

Not sure how easy it would be to work on #20 and add the dataset from the original paper, but that may ground our changes to the code-base in reality and help ensure that the software is working as intended. I think it's tough to gauge what is correct / not correct without some sort of benchmark.

adidier17 commented 4 years ago

@adidier17 just checked out the CORD example, and it looks much better! I think it's certainly more representative of the true results. What are you seeing in the edge weights that lower your confidence?

Not sure how easy it would be to work on #20 and add the dataset from the original paper, but that may ground our changes to the code-base in reality and help ensure that the software is working as intended. I think it's tough to gauge what is correct / not correct without some sort of benchmark.

In the CORD example, at least in the subset I have taken (bronchiolitis+infan), the end ranking is that there are two authors with very high scores and the rest are 0. This is certainly possible in a highly disconnected graph, but it's worth more checking.

I agree that working on #20 would be really helpful. Not sure if I'll have time to get to that before Thurs, but we will see.