croqaz / clean-mark

Convert an article into clean text
MIT License
600 stars 51 forks source link

Feature request: allow taking HTML from stdin #6

Closed Seirdy closed 4 years ago

Seirdy commented 4 years ago

A number of tools ranging from curl to go-readability output raw HTML which can then be processed by clean-mark. Currently, clean-mark has no documented way to view anything besides a URL, so it can't work offline.

croqaz commented 4 years ago

Hi! I think I implemented this in the last commit https://github.com/croqaz/clean-mark/commit/63a46c4443751493ad8a2bc822e64d179054c0c4 . I didn't make a release, you probably need to clone the repo and test it. If you think it's OK, you can close the issue and I'll release a new version.

Seirdy commented 4 years ago

Thanks!

It seems that combining STDIN with the --stdout flag causes a file to be written and sends the result to STDOUT, while passing a URL with the --stdout flag properly sends the result to STDOUT without writing the file.

Example:

clean-mark --stdout <path/to/file.html

I tested this from 63a46c4.

croqaz commented 4 years ago

You're absolutely right, the STDIN + --stdout option were broken, I forgot to return after printing the output. Fixed in https://github.com/croqaz/clean-mark/commit/33aa228d055b97884a9b6d0c48972b9faeaa663b

Seirdy commented 4 years ago

Looks good now.