ePADD / epadd

ePADD is a software package developed by Stanford University's Special Collections & University Archives that supports archival processes around the appraisal, ingest, processing, discovery, and delivery of email archives.
https://www.epaddproject.org
111 stars 24 forks source link

Exception fetching/indexing emails on importing #461

Open sshipley64 opened 5 months ago

sshipley64 commented 5 months ago

Exception fetching/indexing emails on importing

After about 4 imports I get this error. trying to import again gives this error. Can provide the mboxes if needed.

Desktop (please complete the following information):

18 Mar 08:11:10 Util ERROR - Exception fetching/indexing emails java.lang.NullPointerException: null at org.apache.jsp.ajax.async.doFetchAndIndex_jsp.doFetchAndIndex(doFetchAndIndex_jsp.java:80) ~[?:?] at org.apache.jsp.ajax.async.doFetchAndIndex_jsp$1.onStart(doFetchAndIndex_jsp.java:341) ~[?:?] at edu.stanford.epadd.util.OperationInfo.lambda$run$0(OperationInfo.java:61) ~[classes/:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] at java.lang.Thread.run(Thread.java:829) ~[?:?]

if try to import the same mbox again, I get

18 Mar 09:02:01 Indexer WARN - !!!!!!! Index reader has 61423 doc(s) of which 909 are deleted) !!!!!!!!!! 18 Mar 09:02:02 Indexer WARN - Number of content docs: 61423, number deleted: 909 18 Mar 09:02:02 Indexer WARN - Number of attachment docs: 24080, number deleted: 0 18 Mar 09:02:03 EmailDocument WARN - Alert!: froms.length > 1: 2 18 Mar 09:02:03 EmailDocument WARN - Sawant 18 Mar 09:02:03 EmailDocument WARN - Kshama 18 Mar 09:02:03 EmailDocument WARN - WARNING!: Multiple from addresses in message (2): Sawant, Kshama Message: /mnt/epadd/sawantPSTS/BITRECOVER_12-03-2024 10-55/Kshama.Sawant@seattle.gov.pst/Top of Personal Folders/Kshama.Sawant@seattle.gov (MainArchive)/Recoverable Items/DiscoveryHolds/DiscoveryHolds.mbox Msg#47c19d9d0054772d4c64ceecc9a93d9dcfca01552d958c67ba080ec6dd3b790d (Subject:UDP discussions) guessed date Jan 1, 1960 18 Mar 09:02:03 EmailDocument WARN - Sawant 18 Mar 09:02:03 EmailDocument WARN - Kshama 18 Mar 09:02:04 EmailDocument WARN - WARNING!: Multiple from addresses in message (2): Sawant, Kshama Message: /mnt/epadd/sawantPSTS/BITRECOVER_12-03-2024 10-55/Kshama.Sawant@seattle.gov.pst/Top of Personal Folders/Kshama.Sawant@seattle.gov (MainArchive)/Recoverable Items/DiscoveryHolds/DiscoveryHolds.mbox Msg#47c19d9d0054772d4c64ceecc9a93d9dcfca01552d958c67ba080ec6dd3b790d (Subject:UDP discussions) guessed date Jan 1, 1960 18 Mar 09:02:04 EmailDocument WARN - Sawant 18 Mar 09:02:04 EmailDocument WARN - Kshama

jfarwer commented 5 months ago

Yes please, the mbox files would be helpful.

sshipley64 commented 5 months ago

https://drive.google.com/file/d/1GIaVXf1fIZlmHKayv2aJD5_OEyjbQVX-/view?usp=drive_link and I did give you permissions this time.

jfarwer commented 5 months ago

I was able to import those files. Could you please send the files epadd.log and epadd.warnings.log from the epadd-settings folder?

sshipley64 commented 5 months ago

https://drive.google.com/file/d/1jmYbpGy3fYz2NzlIG6wWVpn4heWzXrKe/view?usp=sharing I think this is the right one. I've upped my RAM and am trying again with a different large set of mboxes. What RAM does the computer you have use and how much are you giving to JAVA if you don't mind me asking?

jfarwer commented 5 months ago

The computer has 16GB and I used 12GB but I don't know how much is needed. I don't think it was a memory issue, as that would result in an error message 'Out of memory' both popping up on the screen and also in the log files.

I imported all mbox files in one go (pointing to the directory containing all the files on the import screen and then selecting all the folders on the next screen) and that worked. Did you try importing everything in one go or more step by step?

I can't find any relevant error messages in the log files. If it crashes again could you send me again the corresponding log files?

sshipley64 commented 5 months ago

I did it step by step as I was loading as I was converting. I’ll try all in one go and see what it does.

From: jfarwer @.> Sent: Tuesday, March 26, 2024 10:20 AM To: ePADD/epadd @.> Cc: Shipley, Sarah @.>; Author @.> Subject: Re: [ePADD/epadd] Exception fetching/indexing emails on importing (Issue #461)

CAUTION: External Email

The computer has 16GB and I used 12GB but I don't know how much is needed. I don't think it was a memory issue, as that would result in an error message 'Out of memory' both popping up on the screen and also in the log files.

I imported all mbox files in one go (pointing to the directory containing all the files on the import screen and then selecting all the folders on the next screen) and that worked. Did you try importing everything in one go or more step by step?

I can't find any relevant error messages in the log files. If it crashes again could you send me again the corresponding log files?

— Reply to this email directly, view it on GitHubhttps://protect2.fireeye.com/v1/url?k=31323334-50bba2bf-31321b84-4544474f5631-7696d82d7c79984a&q=1&e=5f40c7b2-4c00-4b55-92d0-1c16015f287a&u=https%3A%2F%2Fgithub.com%2FePADD%2Fepadd%2Fissues%2F461%23issuecomment-2021038673, or unsubscribehttps://protect2.fireeye.com/v1/url?k=31323334-50bba2bf-31321b84-4544474f5631-a7090943afcd6a84&q=1&e=5f40c7b2-4c00-4b55-92d0-1c16015f287a&u=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACM5FLGFJN7VYY5TJRJV4SLY2GN2BAVCNFSM6AAAAABE35XE4WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRRGAZTQNRXGM. You are receiving this because you authored the thread.Message ID: @.**@.>>

sshipley64 commented 5 months ago

Did it show BITRECOVER_12-03-2024 10-55 as an empty folder? It's showing it empty when I try to import all at once, but it isn't.

sshipley64 commented 5 months ago

I've loaded it by itself and then it shows folders. I'm trying that one by itself to see if that batch is the issue. 10-55 folder by itself 10-55 folder in group

jfarwer commented 5 months ago

Yes, it shows several folders for BITRECOVER_12-03-2024 10-55. On the import screen did you use the box 'Mbox files'? Sometimes I accidentally use the box underneath 'Non-Mbox email files' and then it shows 0 messages.

sshipley64 commented 5 months ago

No. I tried again to make sure and all of the folders are under mbox files box. mboximports ChooseFolders epaddubuntu.zip

jfarwer commented 5 months ago

Could you please try putting the four BITRECOVER folders into one folder and then pointing to that folder on the import screen rather than using the 'Add folder' button?

sshipley64 commented 4 months ago

Okay, it did do it this time. Should I always wait and have all the mboxes ready to go and do them all at once? I have some collections where I've had to download 40+ psts out of MS Purview and so I have a lot of folders of mboxes.

jfarwer commented 4 months ago

No, there is no need to handle everything at once. When I realised you used the 'Add Folder' button on the import screen, I tested it myself and discovered a bug that occasionally prevents folders from being read correctly (as you experienced when it failed to display any messages for BITRECOVER_12-03-2024 10-55). To work around this please put all the folders you intend to import into a single directory. Then, on the import screen direct to this directory. It should make no difference to use the 'Add Folder' button aside from the initial step of putting your folders into one location. You can return to the import screen at any time after the initial import to add folders. I hope this makes sense. Please let me know if you have any issues.