GAM-team / got-your-back

Got Your Back (GYB) is a command line tool for backing up your Gmail messages to your computer using Gmail's API over HTTPS.
https://github.com/GAM-team/got-your-back/wiki
Apache License 2.0
2.56k stars 203 forks source link

Never mark spam on restore #342

Open aaronadamsCA opened 2 years ago

aaronadamsCA commented 2 years ago

The issue tracker is for reporting product deficiencies. How do I questions should be posted to the discussion forum at https://groups.google.com/group/got-your-back. When in doubt, start at the discussion forum and return here only when instructed to do so.

Please confirm the following:

Full steps to reproduce the issue:

  1. Back up from one account
  2. Restore to another account

Expected outcome (what are you trying to do?): All messages restored with similar structure.

Actual outcome (what errors or bad behavior do you see instead?): Thousands of legitimate messages from the first account classified as spam in the second account.

Unfortunately Gmail won't let me bulk mark them all "not spam", either, so this is a whole lot of repetitive clicking to rectify.

I see the Gmail API has a neverMarkSpam option on some endpoints, but I can't tell if it's available on the endpoint you're using because I can't read Python. πŸ™ƒ

jay0lee commented 2 years ago

Yes, GYB sets this parameter:

https://github.com/jay0lee/got-your-back/blob/main/gyb.py#L1971

so this shouldn't be happening. Can you provide sample messages or a sample backup that is showing this problem?

aaronadamsCA commented 2 years ago

I'm up to several thousand messages in spam, but thankfully I found a workaround that lets you mark more than 50 messages as "not spam" in the Gmail UI:

  1. Search label:spam -label:inbox
  2. Click "Select all"
  3. Click "Select all conversations that match this search"
  4. Click "Move to inbox"
aaronadamsCA commented 2 years ago

Each message in spam shows the same reason for being there:

Why is this message in spam? It is similar to messages that were identified as spam in the past.

So it doesn't seem like it would be a phishing filter thing (my old and new addresses are similar, which had me wondering).

Here is a cleaned-up version of the commands I used:

cd
bash <(curl -s -S -L https://git.io/gyb-install)

mkdir first@firstlast.ca
cd first@firstlast.ca/
~/bin/gyb/gyb --email first@firstlast.ca --action quota
~/bin/gyb/gyb --email first@firstlast.ca --action backup

mkdir firstlast.ca@gmail.com
cd firstlast.ca@gmail.com/
~/bin/gyb/gyb --email firstlast.ca@gmail.com --action create-project
~/bin/gyb/gyb --email firstlast.ca@gmail.com --action restore --local-folder ../first@firstlast.ca/GYB-GMail-Backup-first@firstlast.ca/ --label-restored "firstlast.ca"

The messages going to spam are decidedly the "spammier" ones, it's almost exclusively newsletters and notification emails; so it does seem like the spam filter is somehow processing each inbound message despite being asked not to.

aaronadamsCA commented 2 years ago

Can you provide sample messages or a sample backup that is showing this problem?

Let me know if any of the information above helps. If not, after my restore finishes running, I can try reproducing the problem with a small backup that I'd be comfortable sharing.

aaronadamsCA commented 2 years ago

Ha... unaddressed report from 2018 complete with repro:

https://issuetracker.google.com/issues/109956036

I added a comment (didn't mention gyb just in case they filter out issues that mention your GREAT project). I'm willing to bet this is unfixable on your end, since I can clearly see you're doing what you can.

bvinnerd commented 2 years ago

I'm seeing this issue as well, backing up a Workspace account and restoring to a free Gmail account.

I have a total of 53,001 messages in the backup, and on restore there was ~7,200 messages in the Spam folder.

My workaround was to move all of those messages in Spam back to Inbox (by selecting, 100 messages at a time and clicking the Not Spam button in the Gmail UI).

If you're going to do this, please ensure that you have 0 spam messages in the target Gmail account, otherwise you could end up moving genuine spam into your Inbox.

flipflophhj commented 2 years ago

I have this issue too 15000 msgs in spam. Mainly very old messages.

Also many seem gotten the date set to the restore time instead of the original date it was sent.

Most of the messages affected are from before 2000 but I also found one from 2003

jay0lee commented 2 years ago

I just released GYB 1.55 which adds a --cleanup option on restore. This tells GYB to confirm the message has a valid From:, Message-ID: and Date: header on it before restoring. This should prevent the message from landing in Spam.

Can a few people do some testing and confirm it works for them? See the 1.55 release details for more info:

https://git.io/gyb-releases

flipflophhj commented 2 years ago

Hm.. I thought if I emptied the spam folder and then did a restore it would restore all those messages again but it doesn't seem so. What should I do ? Doing an estimate to see if that helps.

jay0lee commented 2 years ago

You need to tell GYB to try restoring all messages again with --noresume.

Jay

On Wed, Jan 26, 2022, 5:26 PM Hans-Henrik Jensen @.***> wrote:

Hm.. I though if I emptied that spam folder and then did a restore it would restore all those messages again but it doesn't seem so. What should I do ?

β€” Reply to this email directly, view it on GitHub https://github.com/GAM-team/got-your-back/issues/342#issuecomment-1022660251, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDIZMGDTP7LIKVKEJQEOPDUYBYINANCNFSM5MOEUPLQ . You are receiving this because you commented.Message ID: @.***>

flipflophhj commented 2 years ago

Hm it would be nice to be able to label the messages that were cleaned up though. I tried to use label-restored but it labels everything now.

flipflophhj commented 2 years ago

Traceback (most recent call last): File "gyb.py", line 2532, in File "gyb.py", line 2007, in main File "gyb.py", line 1769, in message_hygiene File "gyb.py", line 1713, in cleanup_from File "email\utils.py", line 215, in parseaddr File "email_parseaddr.py", line 513, in init File "email_parseaddr.py", line 256, in getaddrlist TypeError: object of type 'Header' has no len() [29748] Failed to execute script 'gyb' due to unhandled exception!

flipflophhj commented 2 years ago

Still got about 300 in spam of the 6000 restored before the exception

jay0lee commented 2 years ago

I can no longer reproduce the issue with the sample from the issue tracker and --cleanup. Can you share examples of messages that went to Spam?

jay0lee commented 2 years ago

I'd need to see the full headers as described at:

https://support.google.com/mail/answer/29436?hl=en

flipflophhj commented 2 years ago

Does it work to send the eml file ?

jay0lee commented 2 years ago

Yes, that's fine. You can post it here or email it to me.

flipflophhj commented 2 years ago

Ok I sent an email.

flipflophhj commented 2 years ago

Oh by the way I saw that all the mails that had the now() date after restore seems to have a correct date in msg-db.sqlite so maybe that could be used for --cleanup

brechmos commented 2 years ago

I am in the same boat (185,000 email to transfer though). I was watching my Spam as the transfer was happening and saw some go in and then automatically go out of Spam. I was nervous the "older than 30 days will be deleted" thing was happening faster than I was moving them out of Spam.

I am redoing my restore but put this filter in place: image

I have not seen anything go to Spam. When the restore is done I'll turn off that filter.

I don't know enough about how quickly "older than 30 days" gets removed from Spam, and don't know if this is "the right thing to do" but it makes this data hoarder less nervous.

jay0lee commented 2 years ago

Has anyone else tested with --cleanup to see if that helps?

On Mon, Jan 31, 2022, 5:56 PM brechmos @.***> wrote:

I am in the same boat (185,000 email to transfer though). I was watching my Spam as the transfer was happening and saw some go in and then automatically go out of Spam. I was nervous the "older than 30 days will be deleted" thing was happening faster than I was moving them out of Spam.

I am redoing my restore but put this filter in place: [image: image] https://user-images.githubusercontent.com/887675/151886623-b1de6315-e273-424e-bd08-ba16df4aefeb.png

I have not seen anything to to Spam. When the restore is done I'll turn off that filter.

I don't know enough about how quickly "older than 30 days" gets removed from Spam, and makes a data hoarder nervous.

β€” Reply to this email directly, view it on GitHub https://github.com/GAM-team/got-your-back/issues/342#issuecomment-1026293288, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDIZMG2Q44TIVYGJUXROYTUY4HTNANCNFSM5MOEUPLQ . You are receiving this because you commented.Message ID: @.***>

flipflophhj commented 2 years ago

Did my EML files work fine for you?

man. 31. jan. 2022 23.59 skrev Jay Lee @.***>:

Has anyone else tested with --cleanup to see if that helps?

On Mon, Jan 31, 2022, 5:56 PM brechmos @.***> wrote:

I am in the same boat (185,000 email to transfer though). I was watching my Spam as the transfer was happening and saw some go in and then automatically go out of Spam. I was nervous the "older than 30 days will be deleted" thing was happening faster than I was moving them out of Spam.

I am redoing my restore but put this filter in place: [image: image] < https://user-images.githubusercontent.com/887675/151886623-b1de6315-e273-424e-bd08-ba16df4aefeb.png

I have not seen anything to to Spam. When the restore is done I'll turn off that filter.

I don't know enough about how quickly "older than 30 days" gets removed from Spam, and makes a data hoarder nervous.

β€” Reply to this email directly, view it on GitHub < https://github.com/GAM-team/got-your-back/issues/342#issuecomment-1026293288 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ABDIZMG2Q44TIVYGJUXROYTUY4HTNANCNFSM5MOEUPLQ

. You are receiving this because you commented.Message ID: @.***>

β€” Reply to this email directly, view it on GitHub https://github.com/GAM-team/got-your-back/issues/342#issuecomment-1026294593, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJV5MVQON6764JT5HET3YLUY4H4BANCNFSM5MOEUPLQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

jhult commented 2 years ago

FWIW, I am also experiencing emails going into Spam (I have not yet tried --cleanup).

Suncatcher commented 2 years ago

Has anyone else tested with --cleanup to see if that helps?

Yes I did, and I can say: it doesn't work.

My numbers on restoration:

I was restoring different accounts so absolute numbers are different but you can easily calculate the percentage, it's nearly the same, with cleanup even worse.

chrishoage commented 2 years ago

I have been doing an import moving 69k emails from a workspace account to a personal gmail account. I used --cleanup when doing the restore and it was still happening.

I have been running into this same issue.

The messages going to spam are decidedly the "spammier" ones, it's almost exclusively newsletters and notification emails; so it does seem like the spam filter is somehow processing each inbound message despite being asked not to.

This has been my experience as well. Lots of receipts, newsletters, etc.

I put in the same filter @brechmos did and that has helped eliminate messages going to spam. The downside to this is new emails are going to "All Mail" but I can live with this for not sending email to spam during the restore.