dteviot / WebToEpub

A simple Chrome (and Firefox) Extension that converts Web Novels (and other web pages) into an EPUB.
Other
720 stars 136 forks source link

[email protected] #169

Closed JeremyStarten closed 6 years ago

JeremyStarten commented 6 years ago

When using Web to Epub I would randomly encounter these when an email is used. But alot of the times it's a fake email in the story or not even an email at all and all of a sudden theres just [email protected]. I don't really understand the point of this because it's easy to find the email if it is gathered by web to epub. Like if an author is like email me here for ____. You would find the email either way even if it is blocked. So would there be a way to remove it?

dteviot commented 6 years ago

@JeremyStarten. You wrote:

When using Web to Epub I would randomly encounter these when an email is used..

Please provide a URL to a web page showing this behaviour and I'll see what I can do. If possible, several web pages would be even better.

JeremyStarten commented 6 years ago

An example using a fake email as part of the story from Everyone Loves Large Chests: interlude - Another day at the office - In this chapter there is a conversation using emails. "From: noreply@demons.r.us" is changed to "From: [email protected]".

An example using a fake email as part of the story from Kuro’s Days: Chapter 30 Addition - "From: Patricia Bluebell Pa.Bluebell@MATV.com" is changed to "From: Patricia Bluebell <[email protected]>"

I also remember seeing an example of it appearing when someone was cursing and had the @ symbol in it like $%!@%^~ but I couldn't find the book

Another one was a magic spell that had it.

Basically it's annoying cause (i think) it just randomly appears when a @ is there.

dteviot commented 6 years ago

Looking at the page, https://royalroadl.com/fiction/8894/everybody-loves-large-chests/chapter/149836/interlude-another-day-at-the-office The raw HTML for noreply@demons.r.us, is

<a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="513f3e2334213d281135343c3e3f227f237f2422">[email&#160;protected]</a>

The other emails in the text are similar.

Likewise, Pa.Bluebell@MATV.com is

<a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="8cdceda2cee0f9e9eee9e0e0ccc1cdd8daa2efe3e1">[email&#160;protected]</a>

A quick Google suggests this is the CDN obfuscating anything it thinks is an email address, to make life difficult for people scraping emails. However, I think I can fix this. See: https://usamaejaz.com/cloudflare-email-decoding/