awarebayes / RecNN

Reinforced Recommendation toolkit built around pytorch 1.7
Apache License 2.0
574 stars 113 forks source link

arXiv abstracts recommender based on specific user preferences #17

Closed ghost closed 3 years ago

ghost commented 3 years ago

Hi @awarebayes ,

Hope you are all well !

I was wondering if RecNN can be used for recommending papers from the full arXiv dataset (1.7 abtsracts).

More precisely, I would like to use the categories or authors attributes for setting preferences for recommendation.

To download:

wget -nc https://paper2code.com/public/arxiv-metadata-oai-weaviate.tar.gz
tar xvf arxiv-metadata-oai-weaviate.json.tar.gz

Excerpt:

{
  "authors": [
    "Maxim A. Yurkin",
    "Valeri P. Maltsev",
    "Alfons G. Hoekstra"
  ],
  "abstract": "We performed a rigorous theoretical convergence analysis of the discrete dipole approximation (DDA). We prove that errors in any measured quantity are bounded by a sum of a linear and quadratic term in the size of a dipole d, when the latter is in the range of DDA applicability. Moreover, the linear term is significantly smaller for cubically than for non-cubically shaped scatterers. Therefore, for small d errors for cubically shaped particles are much smaller than for non-cubically shaped. The relative importance of the linear term decreases with increasing size, hence convergence of DDA for large enough scatterers is quadratic in the common range of d. Extensive numerical simulations were carried out for a wide range of d. Finally we discuss a number of new developments in DDA and their consequences for convergence.",
  "categories": [
    "Optics",
    "Computational Physics"
  ],
  "comments": "23 pages, 5 figures",
  "doi": "10.1364/JOSAA.23.002578",
  "id": "0704.0033",
  "journal-ref": "J.Opt.Soc.Am.A 23(10): 2578-2591 (2006)",
  "report-no": "",
  "submitter": "Maxim A. Yurkin",
  "title": "Convergence of the discrete dipole approximation. I. Theoretical  analysis",
  "versions": [
    "v1"
  ]
}

Questions:

Thanks for any insights or inputs on that.

Cheers, X

awarebayes commented 3 years ago

Yes, it works with everything, so long as you can transform it into a vector. About the authors: I would choose like top 5000 and encode them categorically somehow. Plenty of encodings: category -> vector You can also search the papers which they wrote and create some NLP representation of the authors by titles. (Mean of BERT output) About multiple categories: same answer applies I would also consider obtaining BERT representations of the abstracts Also consider doing something like that with the title. Then you just merge together and apply PCA to all the vectors to get something dimensionally meaningful like 128 / 256 / 512 If you have user interactions (i.e. ratings of article by a given user), RecNN can be used for learning and recommendation.

But if you want to find a similar article, then just use Faiss / Milvus on these vectors, no learning needed

ghost commented 3 years ago

Thanks @awarebayes for your quick reply :-)

Would you help me to implement that for https://paper2code.com ? It would be awesome for highlighting your work on RecNN.

If you wanna discuss about it privately, here is my telegram handle deepocrates.

Cheers, X

awarebayes commented 3 years ago

I do not think you need a learning based recommendation since you do not have any ratings You do not need to use RecNN Just make vectors and use Faiss with Milvis Milvus is docker based and is super easy to set up

ghost commented 3 years ago

Hi,

The thing is paper2code is matching the repositories per papers so we have the stars as rating.

Eg. Paper page: https://paper2code.com/paper/164402/image-super-resolution-with-cross-scale-non-local-attention-and-exhaustive-self-exemplars-mining Code page: https://paper2code.com/code/github.com/SHI-Labs/Cross-Scale-Non-Local-Attention

That's why I posted the issue. Sorry that I forgot to introduce that.

Cheers, X

awarebayes commented 3 years ago

It focuses on Reinforcement Learning for personalized news recommendation

One person is not personalized. Also it is not sequential But I got an idea for you: Create a github parser and parse people's profiles for github stars for respective papers, use it as user ratings, timestamps can make in sequential Sorry, you would have to do the parsing part yourself Upon accomplishing that, feel free to write me a question concerning the library on awarebayes@gmail.com, or open a github issue