Closed BillCM closed 3 months ago
@BillCM it is possible to attach the CSV file that causes the problem to this issue (or a stripped down one, as long as it causes the same problem). This can can save me some time when building a test case.
Thanks, Jan
@janblom Correct. CSV exported from Excel have the Byte Order Mark set. The only way to make RabbitInAHat to read the file is to remove the BOM by changing the encoding. Perhaps this it worth a note in the docs?
Hi @BillCM , thank you for reporting this issue.
I have prepared a fix already which adds flexibility, so that RabbitInAHat can read CSV's with and without a BOM. This will be part of the upcoming 1.0 release. (Unfortunately testing another aspect of that release is taking some time. )
Since this issue will be fixed, it is not necessary to update the docs. This issue will serve as the (temporary) documentation until the fix is released, and the issue closed. (the fix is in my employers public repo until I have it approved and merged into the OHDSI repo).
If possible, could you attach a CSV to this issue that I can use to reproduce the bug? While I am fairly confident that the upcoming fix will cover your case, there is nothing better than having the certainty :-)
Thanks, Jan
@janblom I think this very issue is causing the build to break. It appears that the embedded CSVs for CDM5.0 and CDM5.1 and their stem models are all being identified as Excel encoded with BOM. This is causing the mvn build to fail for main branch on my machine.
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.3.1:resources (default-resources) on project rabbitinahat: filtering /Users/bill/ext/WhiteRabbit/rabbitinahat/src/main/resources/org/ohdsi/rabbitInAHat/dataModel/StemTableV5.0.csv to /Users/bill/ext/WhiteRabbit/rabbitinahat/target/classes/org/ohdsi/rabbitInAHat/dataModel/StemTableV5.0.csv failed with MalformedInputException: Input length = 1 -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.3.1:resources (default-resources) on project rabbitinahat: filtering /Users/bill/ext/WhiteRabbit/rabbitinahat/src/main/resources/org/ohdsi/rabbitInAHat/dataModel/StemTableV5.0.csv to /Users/bill/ext/WhiteRabbit/rabbitinahat/target/classes/org/ohdsi/rabbitInAHat/dataModel/StemTableV5.0.csv failed with MalformedInputException
Upon opening the code in IntelliJ, the 4 files in question are linked to the Excel icon and will not open for editing.
After converting the files to UTF-8, the build works.
I am unable to reproduce the last report, both on Linux and MacOS. Could it be that the csv files related to this were inadvertedly changed? I suspect an encoding problem (setting in your machine, such as locale) but I am unable to verify that. Since this is very likely not related to the issue reported here first, I will not investigate this further in this context. If you do think this is a problem of the WhiteRabbit project, please report this in a separate issue.
It is in any case not related to the first problem reported in this issue (I was able to confirm that). The original problem is now fixed in the release-1.0.0
branch, including a second BOM related issue in WhiteRabbit. It will be in the planned 1.0.0 release, hopefully soon.
A fix for the first issue reported in this thread is included with the second release candidate of version 1.0.0
Seolved in WhiteRabbit version 1.0.0
Describe the bug I created a custom model in Excel (XLSX) and exported to CSV. This file failed to load and resulted in this error
java.lang.IllegalArgumentException: Mapping for table not found, expected one of [table, field, required, type, schema, description] at org.apache.commons.csv.CSVRecord.get(CSVRecord.java:121) at org.ohdsi.rabbitInAHat.dataModel.Database.generateModelFromCSV(Database.java:117) at org.ohdsi.rabbitInAHat.RabbitInAHatMain.doSetTargetCustom(RabbitInAHatMain.java:465) at org.ohdsi.rabbitInAHat.RabbitInAHatMain.lambda$createMenuBar$9(RabbitInAHatMain.java:268)
The problem is that CSV parsing does not account for the Byte Order Mark (BOM).
To Reproduce Steps to reproduce the behavior:
Expected behavior CSV opens correctly.
Workaround Open the exported CSV in a text editor and change the encoding from "UTF-8 with BOM" to "UTF-8"
Desktop (please complete the following information):
Additional context The issue