Closed gpoo closed 8 years ago
This is just a patch to discuss the decoupling of mail retrieval from the application class, as it was filed in issue #53.
I have tested it against Sqlite and Postgresql. So far, the only public API is fetch()
.
A potential improvement is to make the class a generator. So, it could return the control back after every file downloaded. However, that could be in a different patch or issue.
I noticed that the code for gmane uses a db to determine the number of messages to download. That is the code must be rewritten to decouple it.
The current status of this patch is close to what @sbenthall was asking for in #53. What you can do now is something like:
import sys
from pymlstats.archive import MailingList, LocalArchive, GmaneArchive,\
MailmanArchive, GMANE_URL
if __name__ == '__main__':
ml = MailingList(url_or_dirpath=sys.argv[1], compressed_dir=sys.argv[2])
if ml.is_local():
o = LocalArchive(ml)
elif ml.location.startswith(GMANE_URL):
o = GmaneArchive(ml)
else:
o = MailmanArchive(ml)
o._create_download_dirs()
for mla in o.fetch():
print(mla.url)
Then,
$ python test.py URL OUTPUT-DIR
will retrieve the mailing lists in OUTPUT-DIR
.
@jgbarah AFAIR, you implemented gmane
. I added an alternative way to determine the offset. It is by looking the files stored instead of the database. The class is GmaneArchive, if you want to take a look at it. In any case, for that one in particular, the constructor accepts an arbitrary offset as parameter. This allowed to decouple it from the database.
Move classes related to Mail Archiving and the archive retrieval process out of the Application class.
This is the first step towards fetching mail archives as a library, in addition to the running the application (mlstats).