Gmail Extractor is an automated system for processing email attachments from a Gmail account. It downloads attachments, processes them based on their file type, and saves the processed data in a structured format. The system is designed to handle various file types including PDFs, Word documents, Excel spreadsheets, CSVs, and images.
gmail-extractor/
│
├── config/
│ └── constants.js
│
├── logs/
│
├── src/
│ ├── attachments/
│ │ ├── fileHandler/
│ │ │ ├── imageHandler.js
│ │ │ ├── pdfHandler.js
│ │ │ ├── spreadsheetHandler.js
│ │ │ └── wordHandler.js
│ │ └── attachmentProcessor.js
│ │
│ ├── auth/
│ │ └── authHandler.js
│ │
│ ├── email/
│ │ ├── emailProcessor.js
│ │ ├── imapListener.js
│ │ └── resetEmailsAndAttachments.js
│ │
│ ├── google-sheets/
│ │ └── google-sheets-api.js
│ │
│ ├── utils/
│ │ ├── combineEmailData.js
│ │ ├── convertPdfToImage.js
│ │ ├── createDataDirectories.js
│ │ ├── deleteFile.js
│ │ ├── fileUtils.js
│ │ └── logger.js
│ │
│ └── zod-json/
│ ├── emailDataProcessor.js
│ └── emailDataSchema.js
│
├── .env
├── .gitignore
├── credentials.json
├── Dockerfile
├── index.js
├── package.json
├── README.md
└── token.json
Clone the repository:
git clone https://github.com/yourusername/gmail-extractor.git
cd gmail-extractor
Install dependencies:
npm install
or if you're using Yarn:
yarn install
Copy the .env.example
file to .env
:
cp .env.example .env
{
"offerNumber": "string",
"offerDate": "string",
"customer": {
"name": "string",
"location": "string"
},
"supplier": {
"name": "string",
"contact": {
"name": "string",
"email": "string",
"phone": "string"
}
},
"offerDetails": {
"currency": "string",
"deliveryTerms": "string",
"deliveryDate": "string",
"paymentTerms": "string",
"totalQuantity": "number",
"periodOffered": "string"
},
"products": [
{
"itemNumber": "string",
"material": "string",
"grade": "string",
"surface": "string",
"thickness": "number",
"width": "number",
"length": "number",
"quantity": "number",
"price": "number"
}
]
}
Edit the .env
file and fill in your specific details:
EMAIL_ADDRESS
: Your Gmail addressPROCESSED_DIR
: Directory for processed attachments (e.g., processed_attachments
)http://localhost:3000/auth/google/callback
to the "Authorized redirect URIs".Create a credentials.json
file in the root directory with the following structure:
{
"web": {
"client_id": "YOUR_CLIENT_ID.apps.googleusercontent.com",
"project_id": "your-project-name",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_secret": "YOUR_CLIENT_SECRET",
"redirect_uris": ["http://localhost:3000/auth/google/callback"]
}
}
.env
file instead of your regular Gmail password.To start the Gmail extractor:
npm start
On first run, you'll be prompted to authorize the application. Follow the URL provided in the console to complete the OAuth2 flow.
Below is a sequence diagram illustrating the main process flow of the Gmail Extractor:
sequenceDiagram
participant User
participant ImapListener
participant EmailProcessor
participant AttachmentProcessor
participant FileHandlers
participant AuthHandler
participant ZodProcessor
participant OpenAIProcessor
participant Gmail
participant GoogleSheets
User->>ImapListener: Start application
ImapListener->>AuthHandler: Request authentication
AuthHandler->>Gmail: Authenticate (OAuth2)
Gmail-->>AuthHandler: Return access token
AuthHandler-->>ImapListener: Authentication successful
loop Listen for new emails
ImapListener->>Gmail: Check for new emails
Gmail-->>ImapListener: New email notification
ImapListener->>EmailProcessor: Process new email
EmailProcessor->>Gmail: Fetch email content
Gmail-->>EmailProcessor: Return email content
EmailProcessor->>AttachmentProcessor: Process attachments
AttachmentProcessor->>FileHandlers: Handle specific file types
FileHandlers-->>AttachmentProcessor: Return processed data
AttachmentProcessor-->>EmailProcessor: Return processed attachments
EmailProcessor->>EmailProcessor: Combine email data (all_{emailId}.json)
EmailProcessor->>ZodProcessor: Validate combined data
ZodProcessor-->>EmailProcessor: Return validated data
EmailProcessor->>OpenAIProcessor: Process data with OpenAI
OpenAIProcessor-->>EmailProcessor: Return structured data
EmailProcessor->>EmailProcessor: Save processed_offer_{emailId}.json
EmailProcessor->>GoogleSheets: Update spreadsheet with processed data
GoogleSheets-->>EmailProcessor: Confirmation
end
ImapListener->>User: Notification of processed emails
The system processes the following file types:
pdfHandler.js
wordHandler.js
spreadsheetHandler.js
imageHandler.js
Processed files and their extracted data are managed by attachmentProcessor.js
.
credentials.json
file is correctly set up and your Gmail account settings are properly configured.logs/
directory for detailed error messages.For deploying to a production environment:
credentials.json
and .env
) are properly secured and not exposed in your repository.Contributions are welcome! Please feel free to submit a Pull Request.
[Specify your license here, e.g., MIT, GPL, etc.]