Arcticons-Team / Arcticons

A monotone line-based icon pack for android
https://arcticons.com/
GNU General Public License v3.0
990 stars 311 forks source link

Email Parser problem #1311

Closed Donnnno closed 1 year ago

Donnnno commented 2 years ago

With the mail provider change to proton mail, I've encountered an issue with our email parser:

The parser doesn't know where to look for the right lines of code...

This is how a normal mails body looks:

<html>
  <head>
    <meta http-equiv="content-type" content="text/html; charset=UTF-8">
  </head>
  <body>
<div>Manufacturer : Fairphone
Model : FP3
Product : FP3
Screen Resolution : 1080 x 2112 pixels
Android Version : 10
App Version : 4.3.8
CandyBar Version : 3.14.2

Time Cop
ca.hamaluik.timecop/ca.hamaluik.timecop.MainActivity
https://f-droid.org/en/packages/ca.hamaluik.timecop/</div>  </body>
</html>

And this is proton:

<div dir=3D"auto">Manufacturer : Xiaomi<br>Model : M2006C3MG<br>Product : a=
ngelica_global<br>Screen Resolution : 720 x 1449 pixels<br>Android Version =
: 10<br>App Version : 4.6.2<br>CandyBar Version : 3.14.2<br><br><br>airasia=
<br>com.airasia.mobile/com.airasia.mobile.SplashScreenActivity<br><a href=
=3D"https://play.google.com/store/apps/details?id=3Dcom.airasia.mobile">htt=
ps://play.google.com/store/apps/details?id=3Dcom.airasia.mobile</a><br><br>=
Merriam-Webster Dictionary<br>com.merriamwebster/com.merriamwebster.diction=
ary.activity.dictionary.DictionaryActivity<br><a href=3D"https://play.googl=
e.com/store/apps/details?id=3Dcom.merriamwebster">https://play.google.com/s=
tore/apps/details?id=3Dcom.merriamwebster</a><br><br>Shopee<br><a href=3D"h=
ttp://com.shopee.my/com.shopee.app.ui.home.HomeActivity_">com.shopee.my/com=
.shopee.app.ui.home.HomeActivity_</a><br><a href=3D"https://play.google.com=
/store/apps/details?id=3Dcom.shopee.my">https://play.google.com/store/apps/=
details?id=3Dcom.shopee.my</a><br><br>TikTok Lite<br>com.zhiliaoapp.musical=
ly.go/com.ss.android.ugc.aweme.main.homepage.MainActivity<br><a href=3D"htt=
ps://play.google.com/store/apps/details?id=3Dcom.zhiliaoapp.musically.go">h=
ttps://play.google.com/store/apps/details?id=3Dcom.zhiliaoapp.musically.go<=
/a>=C2=A0</div>
S1SYPHOS commented 2 years ago

Hey there, coming over from Mastodon and not knowing your product: could you provide a workflow to reproduce the problem you're encountering? Like: "put this text in a file, pass it to this script and the result needs to look like this" 😀

Donnnno commented 2 years ago

Hi @S1SYPHOS ! Thanks for the help.

It's a script that takes lines from icon request emails and put's it in a large request file.

The mails look like this: image

and like this in the request list:

image

This is the command I use to run the script python email_parser.py ./mail appfilter.xml requests.txt

""" Script Usage: python (or python3) delta_email_parser.py ./path/to/emlFolder ./path/to/appfilter.xml (./path/to/requests.txt)

Arguments 0: Path to folder containing .eml files of requests 1: Path to existing appfilter.xml to recognize potentially updatable appfilters 3 (optional): existing requests.txt file to augment with new info

Output If only two arguments are given the script will generate 'requests.txt' and 'updatable.txt'. If the third argument is given the file will be overwritten with the updated info. """

S1SYPHOS commented 2 years ago

Alright, so basically I'd need a proton mail eml file :smiley: see Mastodon!

Donnnno commented 2 years ago

Thanks, I've just looked at some different emails, but a problem is that they all are slightly different in the way they're formatted..

(removed the eml files here because of privacy)

S1SYPHOS commented 2 years ago

I'm onto something, but will look at it again later - looks like fun :rofl:

Donnnno commented 2 years ago

Haha thanks a lot!

Looks more like hell to me 🙃

S1SYPHOS commented 2 years ago

Alright, it's a bit weird, but I tried not using external dependencies for further parsing:

around line 80, when checking if parsed is None, I did this:

if parsed is None:
    parsed = msg.get_body().get_content()
    emailBody = "\n".join([TAG_RE.sub('', string) for string in parsed.split('<br>')])

else:
    emailBody = parsed.get_content()

.. which gives me this - and no further errors:

-------------------------------------------------------
3 Requested Apps Pending (Updated 24 August 2022)
-------------------------------------------------------

<!-- Action Blocks -->
<item component="ComponentInfo{com.google.android.apps.accessibility.maui.actionblocks/com.google.android.apps.accessibility.maui.actionblocks.home.HomeActivity}" drawable="action_blocks" />
https://play.google.com/store/apps/details?id=com.google.android.apps.accessibility.maui.actionblocks
https://f-droid.org/en/packages/com.google.android.apps.accessibility.maui.actionblocks/
Requested 1 times
Last requested 1659958279.0

.. which is cat output for an empty requests.txt :grin:

Donnnno commented 2 years ago

woah sounds great! but I'm getting this message right now:

"TAG_RE" is not defined

where should I put that tag?

S1SYPHOS commented 2 years ago

Sorry, it's the compiled regex, just at the top, line 47 or so:

TAG_RE = re.compile(r'<[^<]+?>')  # or maybe some other name, something more descriptive
Donnnno commented 2 years ago

YESSS GREAT!!! it works again, thank you so much!

edit: okay it almost works haaha

S1SYPHOS commented 2 years ago

It seems to me that this script is used throughout many "icon" pack android apps, correct? 🙃

Donnnno commented 2 years ago

Yeah, that's correct. There are other projects using it too :-)

S1SYPHOS commented 2 years ago

Well, if it were my project, I'd harden the parser by adding tests - and probably using third party libraries for special cases (mangled HTML being such a special case) 😃

Donnnno commented 2 years ago

Thanks! I'm gonna loon into that :+)

S1SYPHOS commented 2 years ago

If you need anything , let me know / give a ping!

Donnnno commented 2 years ago

Just checked, 10 out of 627 mails weren't working, so that's a huge win!