fiatrete / OpenDAN-Personal-AI-OS

OpenDAN is an open source Personal AI OS , which consolidates various AI modules in one place for your personal use.
https://opendan.ai
MIT License
1.7k stars 142 forks source link

Draft of the Storage Scheme for the Email Spider #43

Open alexsunxl opened 1 year ago

alexsunxl commented 1 year ago

Configuration File Path

The configuration file for the email scraping program is located at rootfs/email/config.toml.

The configuration file includes the following fields:

Please note that you should keep your email address and password confidential and ensure they are securely stored.

Sure, here's how you can incorporate this information:

Title: File Organization and Storage Scheme for Email Scraping Program

File Storage Path

The scraped email files will be stored in the directory rootfs/data/xxx@gmail.com/. And also could change it by LOCAL_DIR filed

Creation of Email Folders

Each email will be processed through its name and time to generate a unique MD5 hash. We then use this hash to create a unique folder to store the corresponding email content.

Email Content Storage

Within each email's folder, we create two files to store the main information of the email:

In addition, this folder can also be used to store attachments, images, and other files related to the email.

The above is the file organization and storage scheme for our email scraping program. We welcome your feedback and suggestions so that we can continuously optimize and improve this scheme.

alexsunxl commented 1 year ago

Maybe look like this:

├── data
│   └── sunxinle72@gmail.com
│       └── 5de3e52f3a6b90cabe6cbdd4ae3a5c5b
│           ├── email.txt
│           └── meta.json
lurenpluto commented 1 year ago

Individual emails are stored in a separate directory, the name of the content inside needs to be fixed, so that we can use a fixed builder for each email processing, in addition to the mail inside the image, video, voice and other content, you should to use a separate directory for storage, easy parsing

A complete directory structure might look like the one shown below:

├── email.txt └── meta.json    ├── image    │   ├── image1.jpg    │   ├── image2.jpg    │   └── ...    ├── video    │   ├── video1.mp4    │   ├── video2.mv    │   └── ...    └── audio       ├── audio1.m4a       ├── audio2.flac       └── ...

alexsunxl commented 1 year ago

It might be better to distinguish between images in email attachments and images in the body by placing them in different folders.

what do you think? @waterflier @lurenpluto

waterflier commented 1 year ago

To align with mental models, I suggest that we adopt a structure where each directory corresponds to a single email. As for attachments, I believe there is no need to store them in separate directories. Typically, the number of attachments for a single email isn't excessive, so a separate directory may not be necessary.

From the perspective of Named Data Networking (NDN), we can store all videos and images by their respective hashes. We can then reference these existing files in the email directory using soft links. This approach should provide an efficient and intuitive way to manage our data.