Ideas for parsing page - Githubissues

AHOCHI / AHOCHI-crawler

The web crawler that finds services and their location and stores that information in the database.

GNU General Public License v3.0

2 stars 0 forks source link

Ideas for parsing page #1

Open bridewellc1 opened 8 years ago

bridewellc1 commented 8 years ago

This project seems like something we could use:

https://github.com/stanfordnlp/GloVe http://nlp.stanford.edu/projects/glove/

I think that the system is currently able to determine the location of a webpage. We can then try train the system to understand the page and extract important information. Something like it mentions specific keywords often enough to add some tags or something.

bridewellc1 commented 8 years ago

Also it would be nice if this was open source and/or free. Seems pretty useful:

http://smmry.com/about

Maybe one of these will work:

https://github.com/andersonpaac/smmry-alternate https://github.com/DataTeaser/textteaser

bridewellc1 commented 8 years ago

@mattdenisbeck testing this tagging thing

mattdenisbeck commented 8 years ago

@bridewellc1 The tagging thing works. I got an email notification that I was mentioned. i haven't looked though the projects you mentioned yet, but i will in the next few days. i hope you're right and one of them can help us. Also, I will move the crawler into the new repo later today.