Make available only one method to retrieve a web resource, instead of one for parsing the html index and another one for the mbox.
In addition, added magic numbers to detect the file type by content, which is better than guessing by the subtype given by the web server (not always available).
@sduenas What do you think about this approach?
Any further change in the content retrieval can be done in one place, so we avoid duplication of code and, it seems to me that the logic gets simpler.
BTW, this stores the files ended with .gz as gzipped files.
Re-requesting a pull, now from a branch :-)
Make available only one method to retrieve a web resource, instead of one for parsing the html index and another one for the
mbox
.In addition, added magic numbers to detect the file type by content, which is better than guessing by the subtype given by the web server (not always available).
@sduenas What do you think about this approach?
Any further change in the content retrieval can be done in one place, so we avoid duplication of code and, it seems to me that the logic gets simpler.
BTW, this stores the files ended with
.gz
as gzipped files.