Open hannes-ucsc opened 3 years ago
Would be helpful if ~@theathorn~ someone with Excel on macOS could try to reproduce this given the problem description on the "Re: Exchange format consultation" email thread.
Windows Excel screenshot attached.
Copy cell of interest to notepad:
Hex dump of that cell (via notepad):
Specifically, this answer seems to indicate that adding a UTF-8 BOM fixes the issue at least on more recent versions of Excel.
I think we should ask someone on our team to install Office 360 — which @achave11 pointed out was available for UCSC employees — and try to reproduce the issue on Mac.
… and also the suspected fix.
On my work PC the file extension ".tsv" is configured to auto-open with Notepad (so I don't encounter this bug). On my home PC the file extension ".tsv" is configured to auto-open with Excel (so I do encounter this bug).
@achave11 spike to reproduce on Mac OS and confirm suspected fix.
The bug is reproducible in Mac v10.15.6 using Microsoft Excel for Mac v16.50.
The suspected fix (adding UTF-8 BOM) corrects the displayed format. But it's done at the expense of the column organization. When the BOM is introduce all the columns in the TSV get stuffed into the first column.
@hannes-ucsc : "@achave11 to provide hexdump of first two lines in TSV after BOM header insertion."
@hannes-ucsc: "The findings are Excel doesn't support UTF-8 encoded TSV, only UTF-8 encoded CSV. The longterm solution is to offer an alternative manifest format for the latter, but that requires more elaborate handling of commas and quotes. In the short term we need to advice our users to import the TSV via File - Import rather than double clicking the file, followed by selecting UTF-8 encoding in the import dialog."
Navigate to this project and click the download icon in the Metadata column. This will download a tsv file.
On Windows 10: If I open the file automatically with Excel by double-clicking the file from Windows Explorer then the utf-8 encoding of "micrometer" in the "cell_suspension.cell_morphology.cell_size_unit.text" column displays incorrectly as "μm". If I start Excel, Open File, select the "All Files" type I get the "Text Import Wizard", set "File Origin" to "65001: Unicode (utf-8)", "Finish", then the text in the "cell_suspension.cell_morphology.cell_size_unit.text" column displays correctly as "μm". So on Windows, having the file extension ".tsv" set to "always use Excel to open this type of file" will cause the bug.