Closed erickpeirson closed 2 years ago
Hi @erickpeirson happy to help! If you want to get in touch via email: hello@paperswithcode.com.
A few notes on this, for when we get closer to implementation...
This is a good test-case for integrating relations from an external platform that continuously extends/improves their data, and that provides a much richer data structure than we can reasonably accommodate (or would want to replicate). In addition to the complexity of data maintenance and consistency, there is value in connecting end users directly to the data provider, e.g. to generate awareness of the value of distributed infrastructure, and to foster the community that is generating/curating the data.
Rather than ingesting relations between e-prints and individual code resources, therefore, we should focus on adding relations between e-prints and the PwC resource for that e-print. In other words, instead of:
arXiv e-print ---> GitHub repo 1
arXiv e-print ---> GitHub repo 2
arXiv e-print ---> GitHub repo 3
we would do:
|-> GitHub repo 1
arXiv e-print ---> PwC view for e-print -|-> GitHub repo 2
|-> GitHub repo 3
We will still want to either consult the PwC data dump or (if available in the future) hit their API. But with the objective of identifying which e-prints are represented in their dataset rather than pulling in each individual link.
This is done in labs.
Papers with Code finds code repositories associated with ML papers, including e-prints on arXiv. They make their data available under CC-BY-SA. We should explore what would be involved in incorporating this dataset into arXiv external links, and displaying links to the code repositories on the arXiv abs page of ML papers.
@rstojnic what do you think?