ISMRM / mrhub

Hub for open-source MR-related software
https://ismrm.github.io/mrhub
50 stars 43 forks source link

Sorting by citations #3

Open dgallichan opened 5 years ago

dgallichan commented 5 years ago

Currently we are using the number of citations that Semantic Scholar finds on the main paper associated with the software to allow sorting by a proxy metric related to 'impact' of the software. We realise that this is not a perfect solution, not least because not everyone who cites a paper uses the software - and not everyone who uses software has cited the paper.

For now this seems like a reasonable solution - but feel free to use this space to discuss any issues that arise due to this choice, along with any suggestions you might have for how to improve it.

Note that Semantic Scholar was chosen because of its free-to-use API. We probably can't use Google Scholar as they seem to regularly change their site to block scraping attempts, and Scopus and WoS both have APIs - but at a premium. It seems that CrossRef might be a viable alternative - but I haven't had the time to read more to see if it offers something beyond what now seems to work with the Semantic Scholar option.

uecker commented 4 years ago

A good idea would be to add a randomize option.

dgallichan commented 4 years ago

Thanks @uecker for the suggestion - I spent a little time trying to work out how this could be implemented, but my web-coding knowledge wasn't sufficient to get it work! If anyone wants to implement this, please go ahead :)

mmuckley commented 2 years ago

Hello @dgallichan et al. - this seems like a great initiative. I have some feedback. I don't have any expectations for how the feedback is used - feel free to use or ignore as you wish :).

In terms of citation sorting, I think it would be really good or even necessary to use something besides Semantic Scholar. The main reason is Semantic Scholar is doing a poor job of indexing ISMRM abstracts from the main conference and workshops. One option would be for the ISMRM to work on getting its proceedings indexed, but if that doesn't happen I think it might be necessary to move away from Semantic Scholar as many people are publishing their packages at ISMRM. The current status of MR Hub leaves a great community project like SigPy at the very bottom of the list.

In the meantime as long as MR Hub is sticking with Semantic Scholar I would change the default sorting mechanism. Citations is pretty good, but at the moment many packages are linked to research papers rather than software papers, which just reinforces an author's scientific contributions rather than their software contributions. There are many ways to highlight scientific contributions. We have Google Scholar pages, research awards, prestigious positions, etc. MR Hub is a little more unique in that it can showcase software work that isn't naturally promoted as much.

I think a better option vs. the status quo would be to sort by most recent software update. This would also have the added benefit of highlighting projects that are actively being updated and maintained. Also, it would help promote new projects that might benefit the most from promotion.

Disclaimer: I am looking to PR my project, torchkbnufft, for which the associated paper was at the 2020 ISMRM Sedona Workshop.

uecker commented 2 years ago

I agree, sorting by last update would also be good, but may be difficult to automate. I think we need somebody to implement it...

mmuckley commented 2 years ago

According to the README there is currently a mechanism for querying BitBucket and GitHub for the last update. A commit seems a reasonable surrogate. Were you thinking to use releases for the date? Or is the concern about software not kept on GitHub/BitBucket?

uecker commented 2 years ago

I was thinking about software not on GitHub/BitBucket, e.g. a repository maintained by some institution. But maybe those could also be polled automatically.

notZaki commented 2 years ago

For reference, the default sorting option is defined here: https://github.com/ISMRM/mrhub/blob/32838c332456458ea252003fbc5aafe9f25ea814/js/index.js#L32

If it is decided that the default sort should be by the most recent commit, then the 'ncitations' part can be replaced with 'dateupdated'.

uecker commented 2 years ago

I am not sure what this does. For BART is says 2021-07-07 but the latest release was in March and the latest public commit a couple of days ago. Maybe this is the random number I asked for....

notZaki commented 2 years ago

That might be because the update script was last ran ~3 months ago, so the project info could be out of date. A github action can be set up to automatically run the update script every day, but that's likely a separate issue.

mmuckley commented 2 years ago

We could adapt https://stackoverflow.com/questions/64407333/using-github-actions-to-automatically-update-the-repos-submodules.

dgallichan commented 2 years ago

My impression is that if a Github action could be set to run, say, once a week, this would be a nice solution. I think the main problems in getting it working would be making sure you don't exceed the daily API queries without logging in (although I may already be out of date on this, as these kinds of limits have a habit of changing as well...)

notZaki commented 2 years ago

Each instance of github actions should have 60 API queries per hour. If that becomes a bottleneck, then it should be possible to use the builtin GITHUB_TOKEN to make authenticated requests which have a limit of 1,000 queries per hour.

mmuckley commented 2 years ago

@dgallichan I will try to test a draft of an Action on my fork now that my PR is merged. I'll open a PR if everything works.

mmuckley commented 2 years ago

Sorry - I see it is already merged!

dgallichan commented 2 years ago

So the default sorting option is now 'last update' - I think there are only a few repositories that don't use Github or Bitbucket, so it's mostly pretty good (we could do with adding API querying for Gitlab as well though, but again, hardly any packages affected at the moment). For those hosting themselves, then I guess the onus is on them to submit a PR to the MRHub whenever they want to manually update the date for their package.

Thanks so much notZaki for the Github Action - it ran successfully this lunchtime, and is definitely a good way to keep the MRHub 'fresh'! :)