Open landreev opened 2 years ago
Once again, there is likely nothing wrong with JMS as the underlying queue implementation. But I seriously want to reconsider, and probably re-design, the part of it where we pack multiple file ids into the same JMS message. I.e., uploading N ingestable files results in just one JMS message/one queue entry. (Should probably be N individual entries, if we want to have a manageable queue, want to be able to purge individual entries, etc.) (This implementation is as old as Dataverse 4.0; I personally did not remember that this was the case)
I don't like this "as a ..." notation normally, but it feels appropriate here. It's not just an "admin" though. It would be useful to have this exposed to users (via API etc.) as well. I.e. for a user to be able to easily see the status of an ingest of their file(s). How many other jobs are ahead of them (and how much, in terms of the data size, possibly even with time estimates, etc.). And a similarly straightforward way for the admin to remove a job from the queue. In this respect this issue is a duplicate of #6020. I'm choosing to open a new one, because I'm willing to consider re-implementing the underlying queue mechanism, in addition to providing an interface/API for monitoring the queue.
The situation that started this was that somebody uploaded some 100s of files that the ingest was chugging through. For whatever reason we decided to stop the ingests still in the queue; but realized we didn't know how. The AS command for purging the queue advertised in our guide didn't do it. And the queue appeared to be able to survive a Payara restart and/or redeployment of the application, combined with the purging of the usual directories under .../domain1 (?). Potential explanations would include a) some monumental sloppiness on the admin part b) the JMS queue storing its state in a filesystem location we are not aware of, or c) that Payara is storing the queue in the main database now. This all should be google-able etc. But this is unsettling, that after all these years of using this scheme we do not have a practical understanding of how it works.
There is a very good chance that this issue is purely about UI/API interfacing. I.e., that we just need to figure out and document better command lines and/or provide interfaces for access to the queue information. It should be possible to read and unpack the actual JMS messages containing the information about the files in the queue, and provide it to the end user. But I am willing to even consider re-implementing the queue completely using something other than JMS, if there is a significantly better solution.