CicadaCinema / get-all-senders

GNU General Public License v3.0
4 stars 1 forks source link

Non-standard format? #2

Open blablabla1234678 opened 1 year ago

blablabla1234678 commented 1 year ago

While all this is fun as far as I can tell the output format is not one that Thunderbird supports. So we need to convert. At least if there is a standard for your output format, then write it in the readme file or give any advice how to convert it without the need of writing a custom parser.

CicadaCinema commented 1 year ago

Hey there. The format is: a string of authors, delimited by newline characters. For each message, I fetch the author, which is provided as a string by the Thunderbird API, like this: https://webextension-api.thunderbird.net/en/stable/messages.html#messages-messageheader (I fetch the author field). The format of this string is not up to me. I only add newline characters between authors: https://github.com/CicadaCinema/get-all-senders/blob/4dc7563f96adfdfcf4c022bf8ecdb89d85f3806a/mainPopup/popup.js#L22

At the time I was writing this I found this to be the best option. Let me know if you think there is a better way.

rafo commented 1 year ago

While all this is fun as far as I can tell the output format is not one that Thunderbird supports.

Be more specific.

At least if there is a standard for your output format, then write it in the readme file

Look again. It is written in the readme.

or give any advice how to convert it without the need of writing a custom parser.

That's not the job of @CicadaCinema and its an easy task for you. Ask ChatGPT if you don't know how.

@CicadaCinema Nice job! Exactly what I need. For me, its not a problem to extract or reformat the output. Sometimes I need the pure E-Mail Address, sometimes only the domain behind the @ sometimes even without the country part (.com .net .org). But I think it's unnecessary to implement this on your side, since I think it will lead to much trouble maintaining options / configurations side in Thunderbird when a simple regex or :%s/ could do the job.

Thanx!

rafo commented 1 year ago

Just one more thing: I dont know why Thunderbird spits sometimes """ around names and sometimes not. But to get a clean, standard (maybe CSV) and consistent result, I will propose a pull request.

Done: #3

blablabla1234678 commented 1 year ago

@CicadaCinema I meant, that there are several import formats supported by Thunderbird. With minimal effort the code can be rewritten to CSV for example. Screenshot_20230922_152214

Not sure if Thunderbird allows all client side Javascript features, but you might be able to make the output file downloadable: https://stackoverflow.com/questions/3665115/how-to-create-a-file-in-memory-for-user-to-download-but-not-through-server So you might be able to write a real export feature with better user experience instead of a half complete one.

blablabla1234678 commented 1 year ago

@rafo Who asked you? Rly...

CicadaCinema commented 1 year ago

I meant, that there are several import formats supported by Thunderbird. With minimal effort the code can be rewritten to CSV for example.

I am happy to entertain the idea of adding some optional processing of the author strings (see my comment explaining why I don't want to change the default behaviour).

Right now I don't have a Thunderbird installation to refer to, so please link the "import documentation" in your screenshot if you are interested in this add-on producing CSV files which can be understood by this prompt.

Also note that any optional post-processing feature will have to take into consideration the characters which may appear in this author string (specifically, the author display name), such as angle brackets, commas, quotes and other characters. This is something that I haven't investigated myself, although I did make the assumption that newline characters cannot appear anywhere in the author string.

blablabla1234678 commented 1 year ago

The CSV got this columns in theory and it is comma separated: First Name,Last Name,Display Name,Nickname,Primary Email,Secondary Email,Screen Name,Work Phone,Home Phone,Fax Number,Pager Number,Mobile Number,Home Address,Home Address 2,Home City,Home State,Home ZipCode,Home Country,Work Address,Work Address 2,Work City,Work State,Work ZipCode,Work Country,Job Title,Department,Organization,Web Page 1,Web Page 2,Birth Year,Birth Month,Birth Day,Custom 1,Custom 2,Custom 3,Custom 4,Notes,

Maybe this gives more info about the hardships with the topic: https://superuser.com/questions/993984/properly-import-all-fields-from-windows-contacts-to-thunderbird#_=_

Afaik you need to got all columns add comma and keep empty what you want to have empty. Not sure how to escape comma if it is in a string, probably just remove it or use quotes around the string.

CicadaCinema commented 1 year ago

Are you familiar with regular expressions? I don't think a custom parser is needed. I don't think it would be too difficult to write a regular expression for converting from Thunderbird's author string format to this one, leaving the majority of fields empty.

However, I think I would be reluctant to implement this kind of post-processing into the add-on itself for the reasons I outlined earlier. I also think it might be pretty difficult to handle edge cases (take a look at some sample output to see what I am talking about), such as special characters (angle brackets, quotes, commas, slashes) in the author string.

My advice would be to attempt to write a regular expression for your own set of senders, and check for any mistakes manually (or with lots of assertions in code). If you think you have a good-enough post-processing method that transforms the list of senders to a common format (such as this CSV format for importing an Address Book), then I am happy to feature it more prominently in this repo/add-on description, or perhaps add it as an experimental feature.

The last thing I would want is for the data coming from the add-on to be malformed due to some kind of post-processing, and I don't have a deep enough understanding of the author strings Thunderbird spits out (one could poke around in the source code to try and find out more) to implement a correct regular expression for splitting the email address from the display name in all cases.