OHDSI / WhiteRabbit

WhiteRabbit is a small application that can be used to analyse the structure and contents of a database as preparation for designing an ETL. It comes with RabbitInAHat, an application for interactive design of an ETL to the OMOP Common Data Model with the help of the the scan report generated by White Rabbit.
http://ohdsi.github.io/WhiteRabbit
Apache License 2.0
173 stars 85 forks source link

Invalid File Format #316

Closed SofiaMp closed 2 years ago

SofiaMp commented 2 years ago

When trying to open scan report of a newly generated scan report RiaH gives error Invalid File Format. However works when opening older scan reports.

Steps followed:

  1. Use White Rabbit to generate a scan report of a few Synthea tables, reading from csv files
  2. Open scan report generated with RiaH

Working with WhiteRabbit_v0.10.4 on MacOS 11.5.2 Both old and new scan report have extension .xlsx and don't appear to have any visual differences.

MaximMoinat commented 2 years ago

The issue also happens in v0.10.5.

The error happens when RiaH tries to get the first sheet from the scan report (line 150). It fails to get the overview sheet either by name or by position. It seems like the xlsx reader cannot retrieve any sheets from the workbook. https://github.com/OHDSI/WhiteRabbit/blob/be4dbd2e522a7c9b2b252c80e6d15f1a253c3561/rabbit-core/src/main/java/org/ohdsi/rabbitInAHat/dataModel/Database.java#L147-L151

ChanchalDixit-Cognizant commented 2 years ago

Hello,

Can you please let me know if this is resolved. Need to use this. Thanks.

PYDuquesnoy commented 2 years ago

As a workaround, you can open the ScanReport.xlsx in Excel, and save it. After this operation, rabbitInaHat can process the file correctly.

ChanchalDixit-Cognizant commented 2 years ago

Wow, It worked. Thank you @PYDuquesnoy .

MaximMoinat commented 2 years ago

Thanks @PYDuquesnoy for providing this workaround. We are looking into the issue at the moment.

MaximMoinat commented 2 years ago

Initial inspection shows that somehow the scanreport produced with release v0.10.4 and v0.10.5, cannot be read in correctly by the excel reader. The xml files are somehow not found. Scanreports created with older releases of WR can still be read in by the new release.

There seems to be an issue with the writing of Excel files with the new apache poi dependencies.