ArchitecturalKnowledgeAnalysis / EmailIndexer

Utility for generating Lucene indexes for collections of emails.
MIT License
1 stars 2 forks source link

Added Plain-text and PDF query exporter. #4

Closed wmeijer221 closed 2 years ago

wmeijer221 commented 2 years ago

I went ahead and built export functionality for plain-text and PDF. The front-end contained some code for this, which will be obsolete once this PR is merged. As soon as that's the case, I'll go ahead and make a PR for front-end changes.

The new implementation allows you to export to both one file containing all mails and multiple files each containing a complete mailing thread.

Although the current indexer contains an export function already, this doesn't suffice as it does not account for specific queries, which the new implementation does (I did try to follow the same structure though).

wmeijer221 commented 2 years ago

@andrewlalis, I resolved everything you remarked, except for the eager email fetching. Similarly to what I said in the comment above, I wonder how much of an issue memory actually is as the number of emails exported is always limited by maxResultCount, which is indirectly capped between 1 and 10000 entries (as that's the cap we set in the GUI).

If you think it's still necessary to do this, sure! In that case I might need a little assistance though, as I don't really know how the whole paging story works.

jsyk, the current restructure makes the update no longer backwards compatible.

andrewlalis commented 2 years ago

I'm just going to go ahead and merge it as-is, and then I'll see about updating it to support pagination if / when that is needed.

andrewlalis commented 2 years ago

Also note @wmeijer221 that I am still working on integrating the schema changes into the EmailDatasetBrowser, and at this moment the current version of the EmailIndexer (with its improved schema with separated Tag entities) is incompatible with the old schema, and I need to devise a method to upgrade an older dataset before I make a release of the browser app.