USEPA / emf

Emissions Modeling Framework (EMF)
6 stars 3 forks source link

EMF error with compare datasets #145

Closed cseppan closed 4 months ago

cseppan commented 8 months ago

Christine reported the following error on Tuesday, January 9, 2024 4:04 PM

I have another occasional EMF error to report. This happens from time to time when comparing a post-CoST dataset with a pre-CoST dataset for QA. After I run Compare Datasets and attempt to view the results, I get this:

image001

For an example, see the QA step under FF10 point dataset 2021hb_proj_railyards_2020NEI_POINT_20230128.

This isn’t new, but I hadn’t reported it before because I typically only run into this once or twice a year.

ddelvecchio commented 7 months ago

After researching this...I found that the QA result never did run successfully and the Status of "Success" was incorrect. One of the underlying datasets for the QA step table name length was more than the allowed max character length 0f 63 characters.
I found that the TableMetaData getColumns function will allow any table name length and when a 64 char length table is passed in no columns will be found, to fix the issue I truncated the table name to a make it 63 characters. We might want to consider creating another ticket to research where we are allowing table name lengths of greater than 63 characters.

cseppan commented 6 months ago

Various datasets generated by the EMF (e.g. control strategy outputs, module outputs, temporal allocation outputs) could end up with table names of exactly 64 characters. Existing code would truncate these names when actually creating the Postgres table or displaying/exporting the raw data, which masked the issue. But the entry in the internal_sources table would still be 64 characters which then would cause problems when building a QA step using that dataset.

The original code to generate the table name (shown below) has an error where proposed table names longer than 46 characters get truncated to 45, but names exactly 46 characters long don't get modified. This wouldn't be a problem except for the later timestamp code where the function CustomDateFormat.format_YYYYMMDDHHMMSSSS() can return either a 16- or 17-character string, depending on whether the milliseconds is less than 100.

    private String createTableName(String name) {
        String table = name;
        // truncate if necessary so a unique timestamp can be added to ensure uniqueness
        if (table.length() > 46) { // postgresql table name max length is 63 (NOT 64)
            table = table.substring(0, 45);
        }

        for (int i = 0; i < table.length(); i++) {
            if (!Character.isLetterOrDigit(table.charAt(i))) {
                table = table.replace(table.charAt(i), '_');
            }
        }

        // add unique timestamp to ensure uniqueness (adds 16 characters)
        return table.trim().replaceAll(" ", "_") + "_" + CustomDateFormat.format_YYYYMMDDHHMMSSSS(new Date());
    }

Commit f137f06 fixes the table name length check in various places where this code is reused.

cseppan commented 4 months ago

Released in EMF v4.3