Closed geekygirldawn closed 8 years ago
Do you know which message in particular triggers the issue?
There were actually 15 messages with a bad date (year 0120). I temporarily fixed this by uncompressing the file, doing a search / replace in vi (correct date was 2002), re-compressing the file, re-running mlstats and dropping in the new file after mlstats downloaded the bad file but before the analysis kicked in.
Related: is there a way to tell mlstats to not download the files again but just do the analysis on the previously downloaded files?
On Mar 1, 2016, at 19:26, Germán Poo-Caamaño notifications@github.com wrote:
Do you know which message in particular triggers the issue?
— Reply to this email directly or view it on GitHub.
It would be great if you could identify the message (or a range). Then I could try to take a look at the message in gmane (the bandwitth in this coffee shop does not allow me to retrieve the full archive).
Regarding to your question. I thought there was an issue opened for that or I fixed that somewhere (or maybe no :-)
Anyhow, you can try:
$ python pymlstats/analyzer.py /path/to/mbox
Sorry, I just got back to a computer where I could ssh into my server. I thought I had a saved copy of the bad file, but it looks like I overwrote it when I was in a hurry to kick off another run before I rushed out the door for a meetup :(
So far, it's still running, which is great!
I found a couple: http://download.gmane.org/gmane.linux.network/272/274 They look like spam.
That said, I noticed that the limitation is in the MySQLdb
package because it uses strftime
instead of isotime
. The former handle dates starting from 1900, the later does not have such limitation.
The issue is not triggered using Sqlite (I have not checked Postgresql).
I think we can verify if the date is "valid"... just because of MySQL. I think I won't convince you (nor everybody else) to not use MySQL ;-)
Commit https://github.com/MetricsGrimoire/MailingListStats/commit/3375536629e576b26a1d5a44eef8612026708c3c should fix the issue with the date.
I noticed that the condition was not totally right.
That said, I do think that this is a bug in the MySQL module, because it does not allow to store datetime
below 1900, which does not make sense if someone wants to store historical data.
Is there any way we could trap the error for badly formatted dates, instead of dumping out of mlstats with a python error? :)
Maybe check to see if it's a valid date
Example: