ERDDAP / erddap

ERDDAP is a scientific data server that gives users a simple, consistent way to download subsets of gridded and tabular scientific datasets in common file formats and make graphs and maps. ERDDAP is a Free and Open Source (Apache and Apache-like) Java Servlet from NOAA NMFS SWFSC Environmental Research Division (ERD).
Creative Commons Zero v1.0 Universal
78 stars 57 forks source link

Fixed Value sourceNames are not excluded from duplicate column name detection #82

Closed srstsavage closed 1 year ago

srstsavage commented 2 years ago

Fixed Value sourceNames are not excluded from duplicate column name detection. A dataset containing multiple columns with the same fixed value sourceName, e.g.

<sourceName>=5.0</sourceName>

will result in a com.cohort.util.SimpleException: ERROR: Invalid Table: Duplicate column names error, e.g.

java.lang.RuntimeException: datasets.xml error on or before line #3634: ERROR: Invalid Table: Duplicate column names: [3] and [7] are both "x3d5x2e0".
 at gov.noaa.pfel.erddap.dataset.EDD.fromXml(EDD.java:471)
 at gov.noaa.pfel.erddap.LoadDatasets.run(LoadDatasets.java:359)
Caused by: com.cohort.util.SimpleException: ERROR: Invalid Table: Duplicate column names: [3] and [7] are both "x3d5x2e0".
 at com.cohort.array.PrimitiveArray.ensureNoDuplicates(PrimitiveArray.java:4349)
 at gov.noaa.pfel.coastwatch.pointdata.Table.ensureNoDuplicateColumnNames(Table.java:2124)
 at gov.noaa.pfel.coastwatch.pointdata.Table.ensureValid(Table.java:2114)
 at gov.noaa.pfel.coastwatch.pointdata.Table.dataToString(Table.java:1694)
 at gov.noaa.pfel.coastwatch.pointdata.Table.dataToString(Table.java:1683)
 at gov.noaa.pfel.coastwatch.pointdata.Table.dataToString(Table.java:1672)
 at gov.noaa.pfel.erddap.dataset.EDDTableFromFiles.makeMinMaxTable(EDDTableFromFiles.java:2428)
 at gov.noaa.pfel.erddap.dataset.EDDTableFromFiles.<init>(EDDTableFromFiles.java:1647)
 at gov.noaa.pfel.erddap.dataset.EDDTableFromAsciiFiles.<init>(EDDTableFromAsciiFiles.java:105)
 at gov.noaa.pfel.erddap.dataset.EDDTableFromFiles.fromXml(EDDTableFromFiles.java:359)
 at gov.noaa.pfel.erddap.dataset.EDD.fromXml(EDD.java:448)

Seems like any fixed or calculated sourceName (starting with =) should be excluded from duplicate detection.

srstsavage commented 2 years ago

Until fixed, a workaround is to append a JEXL comment with a unique value after the duplicate fixed value, e.g.

<sourceName>=5.0##1234unique</sourceName>
...
<sourceName>=5.0##5678unique</sourceName>
BobSimons commented 2 years ago

I will fix this.

Another similar (and nicer looking) workaround is to append a different number of 0's after each sourceName =5.0 =5.00 =5.000

BobSimons commented 1 year ago

I will not be making this change. It turns out that the sourceNames, including =expression sourceNames, really do need to be unique. It has to do with the part of the code that selects the columns that are needed in the source file to fulfill the user's request. Each sourceName has to uniquely link to a specific destinationName. So the long term solution will have to be the workaround that I suggested above (or something similar).