GAM-team / got-your-back

Got Your Back (GYB) is a command line tool for backing up your Gmail messages to your computer using Gmail's API over HTTPS.
https://github.com/GAM-team/got-your-back/wiki
Apache License 2.0
2.56k stars 203 forks source link

Metadata can not be read from new google Vault export format #420

Closed TimetravelerDD closed 6 months ago

TimetravelerDD commented 1 year ago

Intro

google vault has a new default export format for meta data which now gives you a csv file instead of XML

On top of that the errors file now uses xml instead of CSV so GYB is currently trying to read the wrong file

https://support.google.com/vault/answer/4388708?hl=en&visit_id=638150245468213825-2609647667&p=new_gmail_export&rd=1#new_gmail_export&zippy=%2Cdecember-updated-ui-for-future-client-side-encryption-functionality%2Cfebruary-new-gmail-export-system-available

https://support.google.com/vault/answer/6099459?hl=en#gmail_contents&gmail_metadata&gmail_count&#gmail_error_new&gmail_count&gmail_export&mailxml&metadata_csv&drive_export&drivexml&metadata&voice_export&voice_xml&gmail_error&chat_error&drive_error&voice_error&zippy=%2Cexport-contents%2Cmessage-parameters-in-the-metadata-file%2Cinformation-in-the-count-file

it seems the old format can not be downloaded anymore online and is available only via the API for a limited time.

It would be great if GYB could support both formats to keep old exports functional

Please confirm the following:

Full steps to reproduce the issue:

  1. download an export from google vault https://vault.google.com/matter/
  2. run GYB with --action restore-mbox and --local-folder

Expected outcome (what are you trying to do?):

GYB is importing the Metadata file

Actual outcome (what errors or bad behavior do you see instead?):

GYB tried to read the error file

Reading Vault labels from 442b file test\test-errors.xml
large files may take some time to read...

Original Announcement from google

We are writing to let you know about upcoming changes to the Vault API, which may require informing your Google Workspace customers who use the Google Vault API directly, or use any 3rd party tool that relies on the Vault API.

On February 24, 2022 we introduced a new export pipeline for Gmail data, which also introduced changes to the file formats of several metadata files (New Gmail export system available). As part of that release we provided users with the ability in the user interface and in the API to select the old pipeline or the new pipeline for generating Gmail exports.

What does this mean for my customers?

Starting on January 22, 2024, we will retire the old Gmail export pipeline and only operate the new Gmail export pipeline. If your customers are using the Vault API or any 3rd party tool that relies on the Vault API, then starting on January 22, 2024 the “MailExportOptions” setting for “useNewExport” will always be set to True, regardless of the value that you set in the API call.

Therefore the files that are returned from a Gmail export will be returned in csv format instead of the old xml format in the previous version. You can find the details about the format changes in the Vault help center article concerning “Gmail (new export)” and share that with your customer.

What do I need to do?

If your customers’ applications that use the Vault API rely on these files to be returned in the XML format, they will need to modify their application code to process the new csv format instead, or contact their respective 3rd party vendor to inquire about the vendor’s plans. They should also forward this email to their vendor for context. Here are the exact changes to the export output.

Information | Old Gmail export pipeline | New Gmail export pipeline | Changes -- | -- | -- | -- Message contents | export_name-N.zip | export_name-N.zip | Multiple zip files logic changes (see help center for examples) Google groups membership information | export_name-group-membership.csv | - | Group member file no longer available Message metadata | export_name-metadata.xmlexport_name-metadata.csv | export_name-metadata.csv | Xml file is no longer available Accounts and message count | export_name-result-counts.csv | export_name-result-counts.csv | No changes Error reports | error.csvexport_name–account-exceptions.csv (Gmail exports)export_name–failed-group-membership-lookups.csv (Groups exports) | export_name-errors.xml | Format change from csv to xml and consolidated to one file Messages that didn't convert to PST | - | export_name-conversion_errors-N.zip | New error file introduced File checksums | File checksums | File checksums | No changes
gioxx commented 6 months ago

Hello everyone. Google is sending out reminders of the upcoming change that will certainly impact GYB as well. Is there any news on this issue from last year?

Thanks!

gabrielwhite commented 6 months ago

From the announcement:

"Starting on February 2, 2024, we will retire the old Gmail export pipeline and only operate the new Gmail export pipeline. If your customers are using the Vault API or any 3rd party tool that relies on the Vault API, then starting on February 2, 2024 the “MailExportOptions” setting for “useNewExport” will always be set to True, regardless of the value that you set in the API call."

Documentation here: https://support.google.com/vault/answer/6099459?sjid=9304153788799510769-AP#gmail_new&zippy=%2Cexport-contents

jay0lee commented 6 months ago

GYB 1.80 (released today) adds support for the new format. Please test and confirm it works for you.

gabrielwhite commented 6 months ago

Thanks @jay0lee! The first run seemed to take a very long time (even on fast-incremental) - is there some database being rebuilt as part of this change? Subsequent runs on fast-incremental were fast.

jay0lee commented 6 months ago

Not sure what you mean. Fast incremental is for backup not restore.

Where is it taking a long time? What's showing in the output?

On Thu, Jan 18, 2024, 3:20 PM Gabe @.***> wrote:

Thanks @jay0lee https://github.com/jay0lee! The first run seemed to take a very long time (even on fast-incremental) - is there some database being rebuilt as part of this change? Subsequent runs on fast-incremental were fast.

— Reply to this email directly, view it on GitHub https://github.com/GAM-team/got-your-back/issues/420#issuecomment-1899148401, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDIZMFUINY5S2SM5JCNT4TYPF72TAVCNFSM6AAAAAAWC5TFIWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJZGE2DQNBQGE . You are receiving this because you were mentioned.Message ID: @.***>

gabrielwhite commented 5 months ago

Mmm. Maybe an unrelated issue then. I upgraded the script, and then ran to check it worked, and the runtime was considerably longer than when I've tested an upgrade in the past. But all worked.