Open alexsunxl opened 1 year ago
Maybe look like this:
├── data
│ └── sunxinle72@gmail.com
│ └── 5de3e52f3a6b90cabe6cbdd4ae3a5c5b
│ ├── email.txt
│ └── meta.json
Individual emails are stored in a separate directory, the name of the content inside needs to be fixed, so that we can use a fixed builder for each email processing, in addition to the mail inside the image, video, voice and other content, you should to use a separate directory for storage, easy parsing
A complete directory structure might look like the one shown below:
├── email.txt └── meta.json ├── image │ ├── image1.jpg │ ├── image2.jpg │ └── ... ├── video │ ├── video1.mp4 │ ├── video2.mv │ └── ... └── audio ├── audio1.m4a ├── audio2.flac └── ...
It might be better to distinguish between images in email attachments and images in the body by placing them in different folders.
what do you think? @waterflier @lurenpluto
To align with mental models, I suggest that we adopt a structure where each directory corresponds to a single email. As for attachments, I believe there is no need to store them in separate directories. Typically, the number of attachments for a single email isn't excessive, so a separate directory may not be necessary.
From the perspective of Named Data Networking (NDN), we can store all videos and images by their respective hashes. We can then reference these existing files in the email directory using soft links. This approach should provide an efficient and intuitive way to manage our data.
Configuration File Path
The configuration file for the email scraping program is located at
rootfs/email/config.toml
.The configuration file includes the following fields:
EMAIL_IMAP_SERVER
: This field is for the IMAP server of your email service. For example,"imap.gmail.com"
.EMAIL_ADDRESS
: This field is for the email address that you want to scrape. Please replace with your own email address.EMAIL_PASSWORD
: This field is for the password of your email account. Please replace with your own password.EMAIL_IMAP_PORT
: This field is for the port number of your IMAP server. For Gmail, this is typically993
.LOCAL_DIR
: This field is for the local directory where you want to store the scraped emails. For example,'rootfs/data'
.Please note that you should keep your email address and password confidential and ensure they are securely stored.
Sure, here's how you can incorporate this information:
Title: File Organization and Storage Scheme for Email Scraping Program
File Storage Path
The scraped email files will be stored in the directory
rootfs/data/xxx@gmail.com/
. And also could change it byLOCAL_DIR
filedCreation of Email Folders
Each email will be processed through its name and time to generate a unique MD5 hash. We then use this hash to create a unique folder to store the corresponding email content.
Email Content Storage
Within each email's folder, we create two files to store the main information of the email:
email.txt
: This file stores the body content of the email.meta.json
: This file stores the header information of the email.In addition, this folder can also be used to store attachments, images, and other files related to the email.
The above is the file organization and storage scheme for our email scraping program. We welcome your feedback and suggestions so that we can continuously optimize and improve this scheme.