JeffersonLab / ccdb

Jefferson Lab Calibration and Conditions Database (CCDB)
8 stars 14 forks source link

Sort directories query to ensure all parents appear before their subdirectories #93

Closed baltzell closed 1 year ago

baltzell commented 1 year ago

We hit a new issue related to CLAS12's CCDB database containing rows in the directories table that appear before their parent directory if left unsorted. My understanding of SQL is that that's perfectly normal and unavoidable.

However, the corresponding query in the CCDB Java library isn't sorted, and the result is passed unsorted to the routine that generates the hierarchical directory structure, which assumes parent directories appear before their daughters.

This causes those subdirectories (and their contents) to be inaccessible via the Java CCDB library. Well, unless you strip off their parent directory in the request, e.g., what should be accessible as "/dog/cat/llama" is really accessible as "cat/llama", where "cat" appears before "dog" in the unsorted query.

Simply sorting the query addresses that issue, and from what I gather, it's just the right thing to do?

Note, the CCDB website and the ccdb python CLI do not appear to be affected similarly, and I didn't try very hard to understand why. But my quick look at the C++ code suggests it should be affected similarly to Java. Has GlueX ever seen this issue?

DraTeots commented 1 year ago

Sorry. It is poor implementation in JAVA. The logic of CCDB is:

  1. Load all directories from DB
  2. Build directory tree (it is not done in DB). With tree full directory names are built and added to a dictionary
  3. Use this built directory in API (API first identify directory ID then use this ID in SQL query).

In order to build directory tree all API-s use something like map<id, directory>. But in C++ this map is filled when directories are loaded from DB. And when on the next loop CCDB searches directories parents, it is always found in the map. In JAVA it fills the map and searches for parents both on fly and thus order matters and your fix - fixes it.

I believe it would be better if I'd implemented that, as it is implemented like in other languages - it should also have a first loop that just fills directoriesById (HashMap<Int, Directory>()) and then go over it again building parents structure. Such fix would ensure that the algorithm works always not relying on a proper query (DB concrete class implemented with sorting in query).