ArchitecturalKnowledgeAnalysis / EmailDatasetBrowser

Application for interacting with datasets produced by the EmailIndexer.
MIT License
3 stars 1 forks source link

Thinking ahead about version 3.0 #25

Open wmeijer221 opened 2 years ago

wmeijer221 commented 2 years ago

While working with the email databrowser, I've started thinking about features that'd be great to add in a future version of the software. However, such changes will definitely affect the underlying data structure, for which they'll most likely result in a major version bump for both the EmailBrowser and the EmailIndexer. I don't know if this project will ever grow old enough to see version 3, however, I figured I'd write it down anyways, so here goes:

andrewlalis commented 2 years ago

Yeah, I've specifically wanted to add notes to emails, and add tag groups. Additionally, I think it would make sense to implement most of these changes first in the EmailIndexer, and I'd also like to maybe see if I can switch over to using SQLite instead of H2, since then pretty much every other language can natively access it without needing some java tool.

Additionally, I would like to separate the various exporters from the EmailIndexer project, since it adds quite some dependency bloat when it's a thing that is only used in certain use cases. (My point here is that I also use the EmailIndexer for other small tools, and they all have itext and commons-csv and all that junk in there.)

I think I'll start by making a v3 branch in EmailIndexer and see how feasible it is to migrate to SQLite, and go from there with the additional upgrades.

andrewlalis commented 2 years ago

Also slightly related: I maybe am thinking about making the database implementation pluggable for EmailIndexer, such that projects that use it just need to configure their own database dependency, but this might be overly complicated for no real benefit.

wmeijer221 commented 2 years ago

When it comes to increasing the support for different types of mailing lists (as currently on the Apache ones are supported), we could look into existing tools that efficiently download this. We could use something like GrimoireLab for this (though this is a Python project, so we'd have to use Py4J or something to make this work; idk how preferable this is). Alternatively, we could simply look into the systems they implemented, which are HyperKitty and Pipermail. Now, I haven't looked into these tools in detail, so I couldn't vouch for their applicability in this scenario, but they could serve as a source of inspiration.

wmeijer221 commented 2 years ago

Implementing alternative ways to sort emails in the browse panel / id panel would probably be interesting to explore as well. Inside the browse panel, emails are currently sorted descending using ID/timestamp, which is kind of weird as you'll end up reading threads backwards.

wmeijer221 commented 2 years ago

We could look at classifying on different levels of granularity; e.g. entire threads vs. individual emails vs. sentences.

The tool works really well for classifying emails already. Currently, I'm doing the last using Atlas.ti, which is a little sub-optimal as importing emails is awkward (the entire formatting is lost, so I still rely on the tool to make sense of what I'm reading). (Idk how useful classifying entire threads is, but it naturally arises when looking at the other two levels of granularity.)

Doing sentence-level classification would probably require quite the overhaul on the email viewer and the general UI, though. (Who knows, my Atlas issues might simply be resolved with a better PDF exporter as well.)