Closed andreleblanc11 closed 3 days ago
This is the commit that introducts utf-8 encoding to FTP file transfers : https://github.com/MetPX/sarracenia/commit/42d9f09edad230839cefe9186ef7cfb9d73d9ccb
This is definitely a V2.24.08 bug. I've checked our archived logs and saw that these errors have started spawning on our DDSR cluster every time a new node got patched 😭, even before we got a failed delivery on August 26th
The commit above is to use utf-8 for filenames. This was probably done to allevate issues when file names have accented characters in them, and so eight bit chars show up... older systems would use iso-8859-1, but the charset would vary by regions, so it would have to be set per server. In the past ten years of so, utf-8 has taken over and become universal.
In this case, the complaint is about encoding a period character. Utf-8 for . is the same as iso8859 for . is the same as us-ascii for . It's unclear why using utf8 would break the presence of . in a file name...
OK, I understand... utf8 is involved... it is actually character B7... "middle dot" which is not a normal period. Those have the same character ordinal in the charset, but the encoding would be different. This is also readin from an ls_index file. I suspect the ls_index file is being written in iso8859, and then read in utf8 causing utf8 to barf on the iso8859 encoding for middle dot.
the actual fix should be something like writing all the ls_index files using utf-8 encoding.
This probably affects any file transfer with an eight-bit name... (accented chars, etc...)
weirdnesses... so it turns out that
This change is documented as being the result of https://datatracker.ietf.org/doc/html/rfc2640.html which changed the expected default.
latin-1 usually means 7bit ascii, so 8 bit characters should have been illegal before, and it should have complained...but just ignored the problem.
It looks like the remote server had a file with an invalid character (iso8859-1 encoded middle dot, rather than a normal period.) ...
checking that server, we see:
So I guess supplying an encoding option might help.
from https://docs.python.org/3/library/ftplib.html:
Changed in version 3.9: .. The encoding parameter was added, and the default was changed from Latin-1 to UTF-8 to follow [RFC 2640] (https://datatracker.ietf.org/doc/html/rfc2640.html).
Reading over the SFTP RFC. it looks like they really, really, really want you to use UTF-8. I have never encountered a non utf-8 sftp server, so for now will continue to just assume all SFTP's are utf-8.
ok, so:
so once PR is accepted, can close this.
Actually, once fixed in sr3, we can label it v2only wontfix... and leave it open, I guess.
What happened
(Only confirmed to be affecting V2.24.08. We also know it doesn't affect V2.21.04)*
*UPDATE: it actually DOES affect v2.21.04... but only when running on python >= 3.9 so ubuntu 22.04. the old version is safe only because it is running on python 3.6 on ubuntu 18.04.
A delivery failed when a sender tried to transfer a file.
On servers hosting V2.21.04, file transferring continued without any problem. However, on servers hosting V2.24.08 of v2, all following file transfers got appended to the retry queue.
The error message when debug is on
Position 44 refers to the
.
character in20240826T2236Z_MSC_Radar-Composite_MMHR_1km.tif
Corrective action
To prevent file transfers from further going to the retry queue with V2.24.08, you can remove the corrupted filename on the remote server and the file transferring should resume.
UPDATE : Removing the file did make the sender stop failing on every file transfer. However, new transfer attempts will fail whenever
nlst
returns a file with.
in its filename.I'm still not sure why this is a problem on v2.24.08
Permanent solution
Move the configuration to sr3