alan-turing-institute / ReadabiliPy

A simple HTML content extractor in Python. Can be run as a wrapper for Mozilla's Readability.js package or in pure-python mode.
MIT License
216 stars 35 forks source link

extruct for structured data #56

Open westurner opened 5 years ago

westurner commented 5 years ago

Not sure what your use cases for ReadabiliPy are. Extruct may be worth mentioning:

https://github.com/scrapinghub/extruct

Currently, extruct supports:

  • W3C's HTML Microdata
  • embedded JSON-LD
  • Microformat via mf2py
  • Facebook's Open Graph
  • (experimental) RDFa via rdflib

https://5stardata.info/en/

https://twitter.com/westurner/lists/semanticweb

westurner commented 5 years ago

CSVW can be represented as JSON-LD: https://github.com/alan-turing-institute/SemAIDA/issues/2