ePADD / epadd

ePADD is a software package developed by Stanford University's Special Collections & University Archives that supports archival processes around the appraisal, ingest, processing, discovery, and delivery of email archives.
https://www.epaddproject.org
111 stars 24 forks source link

Processing to delivery get illegal char on attachments that prevents delivery from opening #454

Open sshipley64 opened 6 months ago

sshipley64 commented 6 months ago

Describe the bug When exporting from processing to delivery in almost every collection, I get attachments that say there is an illegal char. If I go back and change the corresponding emails to do not transfer and export again, it will run.

05 Jan 14:45:31 Util ERROR - java.nio.file.InvalidPathException: Illegal char < > at index 82: data/blobs/4980.President Obama Announces National Fuel Efficiency PolicyU.S. EPA 05_19_2009 03_34 PM at sun.nio.fs.WindowsPathParser.normalize(WindowsPathParser.java:182) ~[?:?] at sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:153) ~[?:?] at sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:77) ~[?:?] at sun.nio.fs.WindowsPath.parse(WindowsPath.java:92) ~[?:?] at sun.nio.fs.WindowsFileSystem.getPath(WindowsFileSystem.java:232) ~[?:?] at java.nio.file.Path.resolve(Path.java:516) ~[?:?] at gov.loc.repository.bagit.reader.TagFileReader.createFileFromManifest(TagFileReader.java:56) ~[bagit-5.2.0.jar:?] at gov.loc.repository.bagit.reader.ManifestReader.readChecksumFileMap(ManifestReader.java:123) ~[bagit-5.2.0.jar:?] at gov.loc.repository.bagit.reader.ManifestReader.readManifest(ManifestReader.java:108) ~[bagit-5.2.0.jar:?] at gov.loc.repository.bagit.reader.ManifestReader.readAllManifests(ManifestReader.java:63) ~[bagit-5.2.0.jar:?] at gov.loc.repository.bagit.reader.BagReader.read(BagReader.java:61) ~[bagit-5.2.0.jar:?] at edu.stanford.muse.index.Archive.readArchiveBag(Archive.java:2558) ~[classes/:?] at edu.stanford.muse.index.ArchiveReaderWriter.readArchiveIfPresent(ArchiveReaderWriter.java:842) ~[classes/:?] at edu.stanford.muse.index.ArchiveReaderWriter.readArchiveIfPresent(ArchiveReaderWriter.java:828) ~[classes/:?] at org.apache.jsp.ajax.async.setExportableAssets_jsp._jspService(setExportableAssets_jsp.java:260) ~[?:?] at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) ~[epadd-standalone.jar:?] at javax.servlet.http.HttpServlet.service(HttpServlet.java:764) ~[epadd-standalone.jar:?] at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:466) ~[epadd-standalone.jar:?] at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:379) ~[epadd-standalone.jar:?] at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:327) ~[epadd-standalone.jar:?] at javax.servlet.http.HttpServlet.service(HttpServlet.java:764) ~[epadd-standalone.jar:?] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:227) ~[epadd-standalone.jar:?] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162) ~[epadd-standalone.jar:?] at edu.stanford.muse.webapp.LoggingFilter.doFilter(LoggingFilter.java:26) ~[classes/:?] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189) ~[epadd-standalone.jar:?] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162) ~[epadd-standalone.jar:?] at org.apache.logging.log4j.web.Log4jServletFilter.doFilter(Log4jServletFilter.java:71) ~[log4j-web-2.18.0.jar:2.18.0] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189) ~[epadd-standalone.jar:?] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162) ~[epadd-standalone.jar:?] at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:197) ~[epadd-standalone.jar:?] at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:97) ~[epadd-standalone.jar:?] at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:541) ~[epadd-standalone.jar:?] at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:135) ~[epadd-standalone.jar:?] at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92) ~[epadd-standalone.jar:?] at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:78) ~[epadd-standalone.jar:?] at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:360) ~[epadd-standalone.jar:?] at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:399) ~[epadd-standalone.jar:?] at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65) ~[epadd-standalone.jar:?] at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:890) ~[epadd-standalone.jar:?] at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1789) ~[epadd-standalone.jar:?] at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) ~[epadd-standalone.jar:?] at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191) ~[epadd-standalone.jar:?] at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659) ~[epadd-standalone.jar:?] at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) ~[epadd-standalone.jar:?] at java.lang.Thread.run(Thread.java:1623) ~[?:?]

java.nio.file.InvalidPathException: Illegal char < > at index 82: data/blobs/4980.President Obama Announces National Fuel Efficiency PolicyU.S. EPA 05_19_2009 03_34 PM java.nio.file.InvalidPathException: Illegal char < > at index 82: data/blobs/4980.President Obama Announces National Fuel Efficiency PolicyU.S. EPA 05_19_2009 03_34 PM at java.base/sun.nio.fs.WindowsPathParser.normalize(WindowsPathParser.java:182) at java.base/sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:153) at java.base/sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:77) at java.base/sun.nio.fs.WindowsPath.parse(WindowsPath.java:92) at java.base/sun.nio.fs.WindowsFileSystem.getPath(WindowsFileSystem.java:232) at java.base/java.nio.file.Path.resolve(Path.java:516) at gov.loc.repository.bagit.reader.TagFileReader.createFileFromManifest(TagFileReader.java:56) at gov.loc.repository.bagit.reader.ManifestReader.readChecksumFileMap(ManifestReader.java:123) at gov.loc.repository.bagit.reader.ManifestReader.readManifest(ManifestReader.java:108) at gov.loc.repository.bagit.reader.ManifestReader.readAllManifests(ManifestReader.java:63) at gov.loc.repository.bagit.reader.BagReader.read(BagReader.java:61) at edu.stanford.muse.index.Archive.readArchiveBag(Archive.java:2558) at edu.stanford.muse.index.ArchiveReaderWriter.readArchiveIfPresent(ArchiveReaderWriter.java:842) at edu.stanford.muse.index.ArchiveReaderWriter.readArchiveIfPresent(ArchiveReaderWriter.java:828) at org.apache.jsp.ajax.async.setExportableAssets_jsp._jspService(setExportableAssets_jsp.java:260) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:764) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:466) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:379) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:327) at javax.servlet.http.HttpServlet.service(HttpServlet.java:764) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:227) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162) at edu.stanford.muse.webapp.LoggingFilter.doFilter(LoggingFilter.java:26) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162) at org.apache.logging.log4j.web.Log4jServletFilter.doFilter(Log4jServletFilter.java:71) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:197) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:97) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:541) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:135) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:78) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:360) at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:399) at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65) at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:890) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1789) at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191) at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.base/java.lang.Thread.run(Thread.java:1623)

To Reproduce Steps to reproduce the behavior: Export processing module Go to collection Click enter Get the error below.

Expected behavior A clear and concise description of what you expected to happen.

Screenshots image

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

sshipley64 commented 6 months ago

Okay, so this is an issue going from epadd processed on Ubuntu and exported and then loaded onto a Windows machine. It doesn't give this same error loading the delivery module on the same ubuntu that it is processed on.

jfarwer commented 6 months ago

Interesting, thanks for reporting that issue. Let me have a look into this.

tomhigginsuom commented 6 months ago

Here's the point where the exception is raised in the JDK - it's in the file path handling code for Windows, so I would guess the problem is that a file name which is allowed on Linux is disallowed on Windows: https://github.com/openjdk/jdk20/blob/master/src/java.base/windows/classes/sun/nio/fs/WindowsPathParser.java#L182

    // Reserved characters for window path name
    private static final String reservedChars = "<>:\"|?*";
    private static final boolean isInvalidPathChar(char ch) {
        return ch < '\u0020' || reservedChars.indexOf(ch) != -1;
    }

Perhaps there's a null/control character in the file name?

jfarwer commented 6 months ago

Just for my understanding:

Are these the steps you are doing:

?

Many thanks

sshipley64 commented 6 months ago

Yes. That’s correct. It didn’t happen when processing and exporting to delivery on windows.

From: jfarwer @.> Sent: Wednesday, January 17, 2024 10:11 AM To: ePADD/epadd @.> Cc: Shipley, Sarah @.>; Author @.> Subject: Re: [ePADD/epadd] Processing to delivery get illegal char on attachments that prevents delivery from opening (Issue #454)

CAUTION: External Email

Just for my understanding:

Are these the steps you are doing:

?

Many thanks

— Reply to this email directly, view it on GitHubhttps://protect2.fireeye.com/v1/url?k=31323334-50bba2bf-31321b84-4544474f5631-f2293f1c52eb21ea&q=1&e=6bfeb32c-8357-4390-80bd-efc304ca6669&u=https%3A%2F%2Fgithub.com%2FePADD%2Fepadd%2Fissues%2F454%23issuecomment-1896344147, or unsubscribehttps://protect2.fireeye.com/v1/url?k=31323334-50bba2bf-31321b84-4544474f5631-e3a19ce342cd1f41&q=1&e=6bfeb32c-8357-4390-80bd-efc304ca6669&u=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACM5FLF6JWKYCBP4LCVY7DTYPAH4ZAVCNFSM6AAAAABBPCB4DGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJWGM2DIMJUG4. You are receiving this because you authored the thread.Message ID: @.**@.>>

jfarwer commented 6 months ago

Thanks. Are you able to share an email file or are they private?

sshipley64 commented 6 months ago

They are public record. Here’s an email that has an attachment that give that error.

From: jfarwer @.> Sent: Wednesday, January 17, 2024 10:49 AM To: ePADD/epadd @.> Cc: Shipley, Sarah @.>; Author @.> Subject: Re: [ePADD/epadd] Processing to delivery get illegal char on attachments that prevents delivery from opening (Issue #454)

CAUTION: External Email

Thanks. Are you able to share an email file or are they private?

— Reply to this email directly, view it on GitHubhttps://protect2.fireeye.com/v1/url?k=31323334-50bba2bf-31321b84-4544474f5631-e271ff8be76dfedf&q=1&e=f15b6eec-b57e-4447-ba12-52870bae3136&u=https%3A%2F%2Fgithub.com%2FePADD%2Fepadd%2Fissues%2F454%23issuecomment-1896436996, or unsubscribehttps://protect2.fireeye.com/v1/url?k=31323334-50bba2bf-31321b84-4544474f5631-b271f315b777399e&q=1&e=f15b6eec-b57e-4447-ba12-52870bae3136&u=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACM5FLFRTXBK3EF2IPJVYE3YPAMILAVCNFSM6AAAAABBPCB4DGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJWGQZTMOJZGY. You are receiving this because you authored the thread.Message ID: @.**@.>>

jfarwer commented 6 months ago

Sorry, I can't see any email. Could you please try again?

sshipley64 commented 6 months ago

epadd-export-docset-b4865538.txt It doesn't support mboxes, so I changed to a .txt file.

jfarwer commented 6 months ago

Great, thanks.

jfarwer commented 6 months ago

Interesting. The attachment name has a blank as the last character. When importing this email in appraisal on Windows, the file name is not accepted and the attachment is not read. The label 'Error in attachment' is added to the email and a note that there was an error is added to the report.

On Ubuntu, however, a blank as the last character of a file name is no problem. No error occurs, and the attachment is read like any other attachment. An entry with the path and name of the attachment is added to the bag (manifest-sha256.txt).

When reading that bag file on Windows the blank at the end of the filename causes the exception you are seeing.

A blank at the end is not an illegal file name but can cause different problems.

If you try to rename a file in the Windows file explorer, the new name containing for example an '*', it will complain that this is an illegal character. If you try to use a file name ending in a blank it will just remove the blank.

We are thinking about what ePADD should do when encountering such file names. Maybe the best option is just to remove the blank (Do we have to record that change somewhere?).

Please let me know if you have any thoughts on this.

sshipley64 commented 5 months ago

removing that blank space would work for me--I don't particularly see any need to record it. It would be nice to have that done both in the accession module and in the processing module as it exports.

jfarwer commented 5 months ago

Thanks. A new version of ePADD with some new features will be released soon. It will also contain the fix for the filenames. When importing emails in appraisal, it removes any blanks at the end of file names. So, the issue is fixed at the first step, and everything works as usual when going to processing and delivery. Fixing the filenames of attachments already imported would be a more complicated operation, so filenames in an existing archive will stay the same when using the new version of ePADD.

I hope that helps. Please let me know if you think anything else is needed.

sshipley64 commented 5 months ago

Thanks. Darn though. I have a collection of over 700,000 emails that were already processed in the linux version that I’ll need to deliver on a windows machine. For some reason our windows machine couldn’t handle a collection that big. I guess that will have to be redone.

From: jfarwer @.> Sent: Monday, January 22, 2024 4:24 PM To: ePADD/epadd @.> Cc: Shipley, Sarah @.>; Author @.> Subject: Re: [ePADD/epadd] Processing to delivery get illegal char on attachments that prevents delivery from opening (Issue #454)

CAUTION: External Email

Thanks. A new version of ePADD with some new features will be released soon. It will also contain the fix for the filenames. When importing emails in appraisal, it removes any blanks at the end of file names. So, the issue is fixed at the first step, and everything works as usual when going to processing and delivery. Fixing the filenames of attachments already imported would be a more complicated operation, so filenames in an existing archive will stay the same when using the new version of ePADD.

I hope that helps. Please let me know if you think anything else is needed.

— Reply to this email directly, view it on GitHubhttps://protect2.fireeye.com/v1/url?k=31323334-50bba2bf-31321b84-4544474f5631-43dc9f2556cec72b&q=1&e=62c5f839-8ce2-4242-8eee-450a0e97c1cb&u=https%3A%2F%2Fgithub.com%2FePADD%2Fepadd%2Fissues%2F454%23issuecomment-1905070904, or unsubscribehttps://protect2.fireeye.com/v1/url?k=31323334-50bba2bf-31321b84-4544474f5631-e92a5b2d5f95b340&q=1&e=62c5f839-8ce2-4242-8eee-450a0e97c1cb&u=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACM5FLC6M2HDQ7SKEG3ZOR3YP37I5AVCNFSM6AAAAABBPCB4DGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBVGA3TAOJQGQ. You are receiving this because you authored the thread.Message ID: @.**@.>>

jfarwer commented 5 months ago

I see. I will think about whether it might be possible to keep working with the existing archive.

Do you know what the problem with importing the large collection was on Windows?

sshipley64 commented 5 months ago

It would crash after a while and just refuse to load more imports without crashing each time. Same ram as Linux. Same memory given to Java to use.


From: jfarwer @.> Sent: Tuesday, January 23, 2024 6:37 AM To: ePADD/epadd @.> Cc: Shipley, Sarah @.>; Author @.> Subject: Re: [ePADD/epadd] Processing to delivery get illegal char on attachments that prevents delivery from opening (Issue #454)

CAUTION: External Email

I see. I will think about whether it might be possible to keep working with the existing archive.

Do you know what the problem with importing the large collection was on Windows?

— Reply to this email directly, view it on GitHubhttps://protect2.fireeye.com/v1/url?k=31323334-50bba2bf-31321b84-4544474f5631-a6f47f6e276b749b&q=1&e=179e95a4-ea86-4386-b1b9-0c35869dd9fa&u=https%3A%2F%2Fgithub.com%2FePADD%2Fepadd%2Fissues%2F454%23issuecomment-1906188161, or unsubscribehttps://protect2.fireeye.com/v1/url?k=31323334-50bba2bf-31321b84-4544474f5631-8e1262287421fbf2&q=1&e=179e95a4-ea86-4386-b1b9-0c35869dd9fa&u=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACM5FLHICPRGA3TIIQPG4NLYP7DJXAVCNFSM6AAAAABBPCB4DGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWGE4DQMJWGE. You are receiving this because you authored the thread.Message ID: @.***>

sshipley64 commented 5 months ago

There might also be people who have this error who won’t know about it and then discover it when they try to move their delivery folders to windows….

From: jfarwer @.> Sent: Tuesday, January 23, 2024 6:37 AM To: ePADD/epadd @.> Cc: Shipley, Sarah @.>; Author @.> Subject: Re: [ePADD/epadd] Processing to delivery get illegal char on attachments that prevents delivery from opening (Issue #454)

CAUTION: External Email

I see. I will think about whether it might be possible to keep working with the existing archive.

Do you know what the problem with importing the large collection was on Windows?

— Reply to this email directly, view it on GitHubhttps://protect2.fireeye.com/v1/url?k=31323334-50bba2bf-31321b84-4544474f5631-a6f47f6e276b749b&q=1&e=179e95a4-ea86-4386-b1b9-0c35869dd9fa&u=https%3A%2F%2Fgithub.com%2FePADD%2Fepadd%2Fissues%2F454%23issuecomment-1906188161, or unsubscribehttps://protect2.fireeye.com/v1/url?k=31323334-50bba2bf-31321b84-4544474f5631-8e1262287421fbf2&q=1&e=179e95a4-ea86-4386-b1b9-0c35869dd9fa&u=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACM5FLHICPRGA3TIIQPG4NLYP7DJXAVCNFSM6AAAAABBPCB4DGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWGE4DQMJWGE. You are receiving this because you authored the thread.Message ID: @.**@.>>

sshipley64 commented 4 months ago

I just tried Release 11.0.1-alpha and I'm still getting an illegal char error

at java.lang.Thread.run(Thread.java:1623) ~[?:?]

java.nio.file.InvalidPathException: Illegal char < > at index 82: data/blobs/5846.Fwd Station Area Planning Community Workshop 5th Bell, 2nd Pike and 2nd Madison Stations java.nio.file.InvalidPathException: Illegal char < > at index 82: data/blobs/5846.Fwd Station Area Planning Community Workshop 5th Bell, 2nd Pike and 2nd Madison Stations

I get an error trying to download the mbox for the email so I can't attach it.

jfarwer commented 4 months ago

What error do you get when trying to attach the mbox file?

Is this again when trying to load an archive on Windows which has been created on Ubuntu?

Thanks

sshipley64 commented 4 months ago

No, this is on ubuntu--I can't download the mbox for that message or even download the attachment without getting an error that it won't download. I have attached the attachment straight from the blob folder.
5846.Fwd Station Area Planning Community Workshop 5th Bell, 2nd Pike and 2nd_ Madison Stations.txt

sshipley64 commented 4 months ago

I added the txt file extension--it's a forwarded email that it couldn't recognize as an email.

jfarwer commented 4 months ago

Thanks. So the error occurs when you are trying to import that mbox file?

Can you send the files epadd.log and epadd.warnings.log from the epadd-settings folder?

sshipley64 commented 4 months ago

Sorry, I'm not explaining right. The first error is the same illegal char after moving from linux to windows. There is also a second error in linux when trying to download the email as an mbox--it just won't do it. The attachment itself is not exactly an error but it's an attached forwarded email that is not decoded but still looks to be in base64 for some reason ( i get a pretty good number of these in older collections). Attached are log and warnings log epaddubuntu.zip

jfarwer commented 4 months ago

Thanks. There seems to be indeed some encoding issue. Let me have a look into that.

sshipley64 commented 4 months ago

Let me know if you want the original mbox or pst--it's public record. It's gone through Novell Groupwise to Outlook before converting to mbox.

jfarwer commented 4 months ago

Yes please, the mbox would be great.


From: sshipley64 @.> Sent: Tuesday, March 5, 2024 8:24:35 PM To: ePADD/epadd @.> Cc: Jochen Farwer @.>; Assign @.> Subject: Re: [ePADD/epadd] Processing to delivery get illegal char on attachments that prevents delivery from opening (Issue #454)

Let me know if you want the original mbox or pst--it's public record. It's gone through Novell Groupwise to Outlook before converting to mbox.

— Reply to this email directly, view it on GitHub [github.com]https://urldefense.com/v3/__https://github.com/ePADD/epadd/issues/454*issuecomment-1979577906__;Iw!!PDiH4ENfjr2_Jw!CNb7BvRxduUxYgT355nZjDHjfhi3WneuKoSaH1ggGDbGM6H2_cKLQE9nMK3sKay-hTA82CChPS7lkP7yb4nyVNKFdn3hRB5zoKA$, or unsubscribe [github.com]https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AHU5VOESQLLDEWMQLVOYOKTYWYSYHAVCNFSM6AAAAABBPCB4DGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZZGU3TOOJQGY__;!!PDiH4ENfjr2_Jw!CNb7BvRxduUxYgT355nZjDHjfhi3WneuKoSaH1ggGDbGM6H2_cKLQE9nMK3sKay-hTA82CChPS7lkP7yb4nyVNKFdn3hyb7FYoQ$. You are receiving this because you were assigned.Message ID: @.***>

sshipley64 commented 4 months ago

https://drive.google.com/file/d/1S8fYH2HCGj_yZkP0tyGSSbWMFXcHOU7v/view?usp=drive_link It's zipped here.

jfarwer commented 4 months ago

Could you please give me permission to access that file?

sshipley64 commented 4 months ago

sorry, should be able to access it now.

jfarwer commented 4 months ago

Thanks. In the epadd.log.warnings file you sent it complains about encoding in 'Hexa' and '8bit+'. I can't find any emails with encoding 'Hexa' or '8bit+' in the attached set. Are these emails the ones used in the ePADD run with that warning file?

sshipley64 commented 4 months ago

Yes it's the same collection


From: jfarwer @.> Sent: Thursday, March 7, 2024 5:23 AM To: ePADD/epadd @.> Cc: Shipley, Sarah @.>; Author @.> Subject: Re: [ePADD/epadd] Processing to delivery get illegal char on attachments that prevents delivery from opening (Issue #454)

CAUTION: External Email

Thanks. In the epadd.log.warnings file you sent it complains about encoding in 'Hexa' and '8bit+'. I can't find any emails with encoding 'Hexa' or '8bit+' in the attached set. Are these emails the ones used in the ePADD run with that warning file?

— Reply to this email directly, view it on GitHubhttps://protect2.fireeye.com/v1/url?k=31323334-50bba2bf-31321b84-4544474f5631-0324520345abe74a&q=1&e=ae800446-4f80-453c-9e37-e1f849847dc1&u=https%3A%2F%2Fgithub.com%2FePADD%2Fepadd%2Fissues%2F454%23issuecomment-1983496531, or unsubscribehttps://protect2.fireeye.com/v1/url?k=31323334-50bba2bf-31321b84-4544474f5631-8c024d4e3f660853&q=1&e=ae800446-4f80-453c-9e37-e1f849847dc1&u=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACM5FLCUYXFEYL3IUS7XWXLYXBS57AVCNFSM6AAAAABBPCB4DGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBTGQ4TMNJTGE. You are receiving this because you authored the thread.Message ID: @.***>

jfarwer commented 4 months ago

The name of the attached email (Fwd: Station Area Planning Community Workshop: 5th & Bell, 2nd & Pike and 2nd& Madison Stations) contains a tab instead of white spaces. This doesn't work as a filename and on both Ubuntu and Windows ePADD doesn't show the attachment but adds the label 'Error in attachment'. However, creating the archive on Ubuntu and then reading it on Windows results in ePADD crashing. I am not sure why, but I implemented a fix where tabs are replaced with white spaces. Could you please try this out (you can use either file, one is jar, one is a Windows executable):

https://livemanchesterac-my.sharepoint.com/:u:/g/personal/jochen_farwer_manchester_ac_uk/EZYt-azuhh5NtOlKhqA0-dABhmafSiPaM-KxHeNXCGNiPw?e=12Gf55

https://livemanchesterac-my.sharepoint.com/:u:/g/personal/jochen_farwer_manchester_ac_uk/ESInr_ieaEtNmFcc-AC-zAcBLCbHb940XjcB7mFdT5Q-WQ?e=uLHgbh

sshipley64 commented 4 months ago

clicking on the link I get Selected user account does not exist in tenant 'The University of Manchester' and cannot access the application '00000003-0000-0ff1-ce00-000000000000' in that tenant. The account needs to be added as an external user in the tenant first. Please use a different account.

jfarwer commented 4 months ago

Sorry, I have updated the links. Can you try again please?

sshipley64 commented 4 months ago

It worked now. I'll get back to you soon.

sshipley64 commented 4 months ago

Okay, I've tried in all the modules and it doesn't seem to having any issues with that attachment anymore--the attachment is still encoded as base64 though.

jfarwer commented 3 months ago

Are you downloading the mbox file that includes the attached email or are you downloading the attachment itself on its own?

sshipley64 commented 3 months ago

On its own.

From: jfarwer @.> Sent: Friday, March 22, 2024 8:36 AM To: ePADD/epadd @.> Cc: Shipley, Sarah @.>; Author @.> Subject: Re: [ePADD/epadd] Processing to delivery get illegal char on attachments that prevents delivery from opening (Issue #454)

CAUTION: External Email

Are you downloading the mbox file that includes the attached email or are you downloading the attachment itself on its own?

— Reply to this email directly, view it on GitHubhttps://protect2.fireeye.com/v1/url?k=31323334-50bba2bf-31321b84-4544474f5631-a391c321daa0285f&q=1&e=eb7f345c-213f-4396-b6e9-09f7b5a337c2&u=https%3A%2F%2Fgithub.com%2FePADD%2Fepadd%2Fissues%2F454%23issuecomment-2015358789, or unsubscribehttps://protect2.fireeye.com/v1/url?k=31323334-50bba2bf-31321b84-4544474f5631-334742648ca58c17&q=1&e=eb7f345c-213f-4396-b6e9-09f7b5a337c2&u=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACM5FLD3DE5CVFDTC3JUYCDYZRFXLAVCNFSM6AAAAABBPCB4DGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJVGM2TQNZYHE. You are receiving this because you authored the thread.Message ID: @.***>

jfarwer commented 3 months ago

Yes, ePADD saves the attached emails as part of the message they are attached to. They will be included (untouched in base64) as part of the mbox or eml file when you download or 'export for preservation' the parent message. When you download the attachment on its own you get the encoded message which is a string of characters. As you don't have an app on your computer for viewing the decoded content, it will open in a text editor and show that string of characters. You can put that in a decoder like https://www.base64decode.org/ and get the decoded message.

I keep seeing emails attached as base64. We could decode them in ePADD and then allow the decoded message to be downloaded, which would be more user-friendly. We will think about that.

sshipley64 commented 3 months ago

Thanks. If I notice it, I can decode-- it's if researchers are looking through it, they may think it's just gibberish or if we don't notice it in a large search download we are giving to researchers. I can put a note in our user guide, but almost no one actually reads that.

jfarwer commented 3 months ago

Not sure whether this is still helpful: For existing archives that don't open because of attachment file names with white spaces at the end, correcting the files manifest-md5.txt and manifest-sha256.txt is enough to make them work.

You will lose the attachments with the problematic characters, but that should not be many. I created a script that will correct those manifest files and tell you how many names were modified (the number of attachments you are losing):

attachment-names-correction.jar [livemanchesterac-my.sharepoint.com]

The script could also tell you what file names were modified if needed.

You have to place the file 'attachment-names-correction.jar' in the folder epadd-appraisal/user. If you double-click that file, it should modify manifest-md5.txt and manifest-sha256.txt and tell you how many names in each file were corrected. I expect the numbers to be equal as the files contain the same list of file names with different hashes.

The original manifest files will be saved with the current date and time attached to their names in case something goes wrong and also as we might want to know what they looked like later.

After running the script, you can start ePADD and see whether it loads the archive.

sshipley64 commented 2 months ago

Yes, This is helpful. I will give it a try. Thanks.

From: jfarwer @.> Sent: Monday, April 15, 2024 5:47 AM To: ePADD/epadd @.> Cc: Shipley, Sarah @.>; Author @.> Subject: Re: [ePADD/epadd] Processing to delivery get illegal char on attachments that prevents delivery from opening (Issue #454)

CAUTION: External Email

Not sure whether this is still helpful: For existing archives that don't open because of attachment file names with white spaces at the end, correcting the files manifest-md5.txt and manifest-sha256.txt is enough to make them work.

You will lose the attachments with the problematic characters, but that should not be many. I created a script that will correct those manifest files and tell you how many names were modified (the number of attachments you are losing):

attachment-names-correction.jar [livemanchesterac-my.sharepoint.com]https://urldefense.com/v3/__https:/livemanchesterac-my.sharepoint.com/:u:/g/personal/jochen_farwer_manchester_ac_uk/EVt0vdcDR2dDtC-19Xml0WIBf1UblKB2wi2JOxLOEy4jsA?e=5ScVKc__;!!PDiH4ENfjr2_Jw!CfnYkjWkzwhYuWGxwgKC7il3-3zXmTsV8dXKKWgyOOVlVrY2UDE-H8uCsxEAa3SsXNU2bIhOrZVHFx56cGHagnSD5dSK$

The script could also tell you what file names were modified if needed.

You have to place the file 'attachment-names-correction.jar' in the folder epadd-appraisal/user. If you double-click that file, it should modify manifest-md5.txt and manifest-sha256.txt and tell you how many names in each file were corrected. I expect the numbers to be equal as the files contain the same list of file names with different hashes.

The original manifest files will be saved with the current date and time attached to their names in case something goes wrong and also as we might want to know what they looked like later.

After running the script, you can start ePADD and see whether it loads the archive.

— Reply to this email directly, view it on GitHubhttps://protect2.fireeye.com/v1/url?k=31323334-50bba2bf-31321b84-4544474f5631-6778f9efbc8aabbe&q=1&e=525ad523-4aeb-469e-b283-b648739d766b&u=https%3A%2F%2Fgithub.com%2FePADD%2Fepadd%2Fissues%2F454%23issuecomment-2056771134, or unsubscribehttps://protect2.fireeye.com/v1/url?k=31323334-50bba2bf-31321b84-4544474f5631-538efa4940845ef7&q=1&e=525ad523-4aeb-469e-b283-b648739d766b&u=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACM5FLAH6ILHLZSMQDLJMQDY5PD35AVCNFSM6AAAAABBPCB4DGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJWG43TCMJTGQ. You are receiving this because you authored the thread.Message ID: @.**@.>>