alephdata / memorious

Lightweight web scraping toolkit for documents and structured data.
https://docs.alephdata.org/developers/memorious
MIT License
309 stars 59 forks source link

Why is `cleanup` removed? #70

Closed pohnean closed 5 years ago

pohnean commented 5 years ago

I see in the documentation that there's an yml configuration called cleanup (https://github.com/alephdata/memorious/blob/c57b7350972b173d91947ab99561434c6c5ce6ff/docs/buildingcrawler.md). However, in the latest version of memorious, this option was removed and no longer works.

May I know why it was removed?

I'm trying to do a post crawler action to notify my team via Telegram/slack using the cleanup option, and I'm trying to figure out how to do it.

sunu commented 5 years ago

Hi @pohnean, thanks for reporting this.

I have just merged a PR (#55) to add post-processing step back into Memorious and updated the docs to reflect that (https://memorious.readthedocs.io/en/latest/buildingcrawler.html#postprocessing). Hope that solves your issue.

pohnean commented 5 years ago

awesome! thanks for the quick response!

pohnean commented 5 years ago

What version is this aggregate function in?

sunu commented 5 years ago

We haven't made a new release yet. Aiming to do that as soon as possible. For now you can use the alephdata/memorious:master docker image or pip install from the master branch