NASA-PDS / harvest

Standalone Harvest client application providing the functionality for capturing and indexing product metadata into the PDS Registry system (https://github.com/nasa-pds/registry).
https://nasa-pds.github.io/registry
Other
4 stars 3 forks source link

Error "Missing ids" does not provide enough information for debugging #96

Closed jordanpadams closed 2 years ago

jordanpadams commented 2 years ago

πŸ› Describe the bug

Identified by ATM:

[SUMMARY] Output format: json 
[SUMMARY] Reading configuration from /home/itrejo/jnogrv_1001.cfg
[SUMMARY] Elasticsearch URL: https://search-atm-prod-mkvgzojag2ta65bnotqdpopzju.us-west-2.es.amazonaws.com:443, index: registry
[INFO] Reading registry schema from Elasticsearch
[INFO] Processing bundle directory /PDS/data/anonymous/PDS/data/jnogrv_1001
[INFO] Processing bundle /PDS/data/anonymous/PDS/data/jnogrv_1001/bundle_juno_grav.xml
[INFO] Processing collection /PDS/data/anonymous/PDS/data/jnogrv_1001/DOCUMENT/collection_document.xml
[ERROR] Missing ids

πŸ•΅οΈ Expected behavior

Error message includes much more useful information

πŸ“š Version of Software Used

v3.6.0

🩺 Test Data / Additional context

TBD download from ATM


πŸ¦„ Related requirements

βš™οΈ Engineering Details

Looks like issue is here: https://github.com/NASA-PDS/harvest/blob/main/src/main/java/gov/nasa/pds/harvest/dao/EsRequestBuilder.java#L58

For whatever reason, the software is unable to read the LIDs from the collection inventory.

Tightly coupled with https://github.com/NASA-PDS/harvest/issues/97

ramesh-maddegoda commented 2 years ago

@jordanpadams, I am wondering if they are using an older version of the harvest.

Because in the latest source code of harvest,

1) The createSearchIdsRequest(Collection ids, int pageSize) is called by searchIds(Collection ids)

2) The searchIds(Collection ids) is called by getNonExistingIds(Collection ids)

3) In getNonExistingIds(Collection ids), there is a check for null and empty IDs and it should not call searchIds(ids), if the IDs are missing.

 public Set<String> getNonExistingIds(Collection<String> ids) throws Exception
    {
        if(ids == null || ids.isEmpty()) return null;
        Response resp = searchIds(ids);

I think in the latest code, this Exception("Missing ids") cannot be thrown, when the IDs are empty or null.

jordanpadams commented 2 years ago

@ramesh-maddegoda copy thanks. Waiting onto he version info to come back from Irma. Stand by.

Thanks for looking into this!

ramesh-maddegoda commented 2 years ago

@tloubrieu-jpl, may be this is another related problem.

In the following code it returns null when (ids == null || ids.isEmpty()).

public Set<String> getNonExistingIds(Collection<String> ids) throws Exception
    {
        if(ids == null || ids.isEmpty()) return null;
        Response resp = searchIds(ids);

        NonExistingIdsResponse idsResp = new NonExistingIdsResponse(ids);
        parser.parseResponse(resp, idsResp);

        return idsResp.getIds();
    }

However, above method is called by idExists(String id) and it is expecting to have a non-null retIds object to call retIds.isEmpty();.

public boolean idExists(String id) throws Exception
    {
        List<String> ids = new ArrayList<>(1);
        ids.add(id);

        Collection<String> retIds = getNonExistingIds(ids);
        return retIds.isEmpty();
    }

I think the getNonExistingIds(Collection<String> ids) should return an empty Set<String> instead of null, when (ids == null || ids.isEmpty()).

What do you think?

jordanpadams commented 2 years ago

@ramesh-maddegoda making that change sounds good to me.

jordanpadams commented 2 years ago

@ramesh-maddegoda and maybe we can update that error message to say something like "Error reading bundle/collection references. Verify the bundle is valid prior to loading the data." or something like that?

ramesh-maddegoda commented 2 years ago

Created pull request https://github.com/NASA-PDS/harvest/pull/99 with the changes proposed above.

jordanpadams commented 2 years ago

Closed per https://github.com/NASA-PDS/harvest/pull/99

tloubrieu-jpl commented 2 years ago

@gxtchen this need to be tested from the latest snapshot version https://github.com/NASA-PDS/harvest/releases/tag/v3.7.0-SNAPSHOT

gxtchen commented 2 years ago

@jimmie Can you show me how to reproduce this issue? thx.

jimmie commented 2 years ago

@ramesh-maddegoda - do you have a test case for this or access to the originally failing ATM dataset?

ramesh-maddegoda commented 2 years ago

@jimmie and @gxtchen, There was another issue related to this https://github.com/NASA-PDS/harvest/issues/97#issuecomment-1184912455

We found that user was using an older version harvest.

@jordanpadams, verified this with the user and closed the issue as follows.

user verified they were using 3.6.0-SNAPSHOT, and once they upgraded to 3.6.0 this issue was fixed

This issue should be also closed with https://github.com/NASA-PDS/harvest/issues/97#issuecomment-1184912455

ramesh-maddegoda commented 2 years ago

@jimmie and @gxtchen , the data set they had this issue was https://atmos.nmsu.edu/PDS/data/jnogrv_1001/DOCUMENT/

jordanpadams commented 2 years ago

thanks @ramesh-maddegoda. agreed.

jordanpadams commented 2 years ago

added invalid label to ticket so this can be avoided in the future