Web crawler - Githubissues

Future-Scholars / paperlib

An open-source academic paper management tool.

https://paperlib.app

GNU General Public License v3.0

1.56k stars 67 forks source link

Web crawler #605

Closed sci-m-wang closed 2 months ago

sci-m-wang commented 2 months ago

Describe your feature request ... References may be more than just papers, and it is hoped that support for crawling web resources such as blogs will be added.

GeoffreyChen777 commented 2 months ago

Do you want to import a webpage as a supplementarity of a paper, or you want to import a webpage as a separated item?

sci-m-wang commented 2 months ago

As a separated item.

GeoffreyChen777 commented 2 months ago

Then it should be a feature request to https://github.com/Future-Scholars/paperlib-entry-scrape-extension

This extension transforms webpages into items in Paperlib.

We cannot maintain a crawler for every webpage. It requires the community to contribute.

You can contribute to this extension. Take the arxiv as an example https://github.com/Future-Scholars/paperlib-entry-scrape-extension/blob/main/src/scrapers/webcontent-arxiv-entry-scraper.ts

bupt-wcm commented 3 weeks ago

It feels like my question is more relevant to this issue, so I put it here. I hope paperlib can add web sites as a separated item, without specific content, just the main title of the web and the corresponding URL. Then we can add tags and take notes, and can read them latter or find some useful materials in these sites.

Or, could you give me some suggestions about how to organize the content in web sites like some analysis of papers.

GeoffreyChen777 commented 3 weeks ago

It feels like my question is more relevant to this issue, so I put it here. I hope paperlib can add web sites as a separated item, without specific content, just the main title of the web and the corresponding URL. Then we can add tags and take notes, and can read them latter or find some useful materials in these sites.

Or, could you give me some suggestions about how to organize the content in web sites like some analysis of papers.

Hi, @bupt-wcm , if a website is related to a paper, you could add it in the note like this:

The note supports markdown.

If you wish to add a website separately in your library, please wait for our next major version. Currently, we are implementing our own sync service, and redesigning the data structure. With our next major version, you will be able to achieve this.

bupt-wcm commented 1 week ago

Thank you very much for your reply and look forward to future updates. Again, thanks for designing great software, which has been an essential part of my workflow.