MetricsGrimoire / MailingListStats

Mailing List Stats is a command line based tool used to analyze mboxes
http://metricsgrimoire.github.com/MailingListStats/
GNU General Public License v2.0
38 stars 25 forks source link

Decouple mail retrieval from SQL-db wrapping Application class #59

Closed gpoo closed 8 years ago

gpoo commented 9 years ago

Move classes related to Mail Archiving and the archive retrieval process out of the Application class.

This is the first step towards fetching mail archives as a library, in addition to the running the application (mlstats).

gpoo commented 9 years ago

This is just a patch to discuss the decoupling of mail retrieval from the application class, as it was filed in issue #53.

I have tested it against Sqlite and Postgresql. So far, the only public API is fetch().

A potential improvement is to make the class a generator. So, it could return the control back after every file downloaded. However, that could be in a different patch or issue.

gpoo commented 9 years ago

I noticed that the code for gmane uses a db to determine the number of messages to download. That is the code must be rewritten to decouple it.

gpoo commented 9 years ago

The current status of this patch is close to what @sbenthall was asking for in #53. What you can do now is something like:

import sys
from pymlstats.archive import MailingList, LocalArchive, GmaneArchive,\
    MailmanArchive, GMANE_URL

if __name__ == '__main__':
    ml = MailingList(url_or_dirpath=sys.argv[1], compressed_dir=sys.argv[2])
    if ml.is_local():
        o = LocalArchive(ml)
    elif ml.location.startswith(GMANE_URL):
        o = GmaneArchive(ml)
    else:
        o = MailmanArchive(ml)

    o._create_download_dirs()

    for mla in o.fetch():
        print(mla.url)

Then,

$ python test.py URL OUTPUT-DIR

will retrieve the mailing lists in OUTPUT-DIR.

@jgbarah AFAIR, you implemented gmane. I added an alternative way to determine the offset. It is by looking the files stored instead of the database. The class is GmaneArchive, if you want to take a look at it. In any case, for that one in particular, the constructor accepts an arbitrary offset as parameter. This allowed to decouple it from the database.