NOAA-OWP / wres

Code and scripts for the Water Resources Evaluation Service
Other
2 stars 1 forks source link

As an administrator, I want to be able to see a list of running, queued, and "remembered" jobs via a tool that monitors the persister #249

Open epag opened 3 weeks ago

epag commented 3 weeks ago

Author Name: Chris (Chris) Original Redmine Issue: 88805, https://vlab.noaa.gov/redmine/issues/88805 Original Date: 2021-03-02 Original Assignee: Hank


The original Description is below. Instead of dumping a list of the running jobs through an endpoint, we are proposing to properly allow an administrator to monitor the persister/Redis. And instead of targeting users with this capability via the WRES GUI, we decided that displaying the list isn't really what is needed. What is needed is enough workers to ensure no one waits and some way to track progress of the evaluation. A ticket for tracking progress already exists, iirc. This ticket will be for the monitoring tool.

Original description below. Thanks,

Hank

=======================================

Periodically, I will need to query the COWRES to get a collection of information on runnings and queued jobs. I can turn around and use this data to present to a user what is going on with their work. Ideally, I'd be able to hit a view via nwcal-wres-prod.[host]/jobs and get:

{
    "jobs": [
        {
            "id": 64464646464888,
            "name": "Example 1",
            "status": "in_progress"
        },
        {
            "id": 984968684684,
            "name": null,
            "status": "in_progress"
        },
        {
            "id": 33348954684,
            "name": "Example 4",
            "status": "in_progress"
        },
        {
            "id": 8468452684465,
            "name": "Example 5",
            "status": "queued"
        }
    ]
}
</code>

The name field is a nice to have, but it is available within the project configurations.


Redmine related issue(s): 112668


epag commented 3 weeks ago

Original Redmine Comment Author Name: Chris (Chris) Original Date: 2021-03-02T17:02:05Z


This sort of feature was requested by several people in the HEFS WRES Q&A earlier today.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-02T13:13:21Z


Scanning some old tickets...

I think resolving this one would require a new endpoint be added. When that endpoint is called, the tasker would go to the persister, request the list of evaluations, and create a response that lists the job ids and state of every job that is not "COMPLETED". Since we resolved issues in recent months with the persister becoming out of sync with the broker job state, that list should reflect reality.

Obtaining the list of job ids with state should be easy. The only "difficult" part will be identifying how to create JSON in Java. I don't think we return JSON with any other endpoint (I would have to confirm), so that would be new. Presumably, there is a straightforward Java build-in library to handle that.

I'm providing a estimated time of 8 hours noting that a development ticket would need to be made, first.

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Chris (Chris) Original Date: 2023-02-02T14:23:23Z


I think the NWIS reader uses (or used) a json library for reading. I think it was simplejson somewhere? I think we were using a mix of that and jackson for a minute. That same library should be able to write to a stream as well. It's not as simple or obvious as it is in javascript or python, but it's not far off from an xml reader (just a tad easier in my opinion). It's just a matter of creating a tree of JSONObjects. It was easy enough to do in C++ and that didn't even have helpful autocomplete or in-ide documentation.

If you don't want to build the tree, I imagine you can just create several new classes and use jackson to serialize.

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2023-02-02T14:33:38Z


We have this on the cp of @wres-config@, for example:

https://github.com/imrafaelmerino/json-values

Seems pretty straightforward if you look at the examples there.

As Chris says, we also have jackson, which is already on the cp of the @wres-tasker@. That is the bigger dog, but harder to use. There are other libraries too, like gson.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-02T14:42:52Z


Thanks, James and Chris. Figured I would Google it when the time came. :)

Creating something like what you are asking for here would be pretty easy, but would expose job ids to users, so it would need to be WRES admin token protected (something I added a few months ago to the @cleandatabase@ call to make sure not just anyone can run it). The harder work would be on the WRES GUI side figuring out what to show users and how to show it.

There are a couple related tickets to this and I plan to discuss it with James later today.

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-03T18:12:28Z


This endpoint added to @WresJob.java@ along with a method added to @JobResults.java@ to return all of the job ids it knows about:

    @GET
    @Path( "/listing" )
    @Produces( "text/html; charset=utf-8" )
    public String getJobListing() throws IOException
    {
        Set<String> jobIds = JOB_RESULTS.getAvailableJobIds();

        JsonFactory factory = new JsonFactory();
        StringWriter jsonObjectWriter = new StringWriter();
        JsonGenerator generator = factory.createGenerator(jsonObjectWriter);
        generator.useDefaultPrettyPrinter(); // pretty print JSON

        generator.writeStartObject();
        generator.writeFieldName("jobs");
        generator.writeStartArray();

        for (String jobId : jobIds)
        {
            generator.writeStartObject();
            generator.writeFieldName( "id" );
            generator.writeString( jobId );
            generator.writeFieldName( "status" );
            generator.writeString( JOB_RESULTS.getJobState( jobId ).toString() );
            generator.writeEndObject();
        }

        generator.writeEndArray();
        generator.writeEndObject();
        generator.close(); // to close the generator

        String jsonString = jsonObjectWriter.toString();
        return jsonString;
    }
</code>

gives me this at the URL @.../job/listing@ (after posting quite a few smoke tests to force some @IN_PROGRESS@ and @IN_QUEUE@ states):

{ "jobs" : [ { "id" : "7699870392561872735", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "8205038679122035219", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "6044199484071214264", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "452299440983061909", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "9093785701244564339", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "7488191570292895321", "status" : "IN_PROGRESS" }, { "id" : "6757262944806363847", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "6429904123045526792", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "8903647909095918212", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "1970390896375372024", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "2413110888756067650", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "7365304625671622572", "status" : "COMPLETED_REPORTED_FAILURE" }, { "id" : "8997391419820897217", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "1743741047490135547", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "1949082861972556619", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "4616710073550848062", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "6616212526171487929", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "6372459071225324193", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "8994399922983903249", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "7606038502850703410", "status" : "COMPLETED_REPORTED_FAILURE" }, { "id" : "8891595156043709025", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "8620816507393251099", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "566995593721941007", "status" : "COMPLETED_REPORTED_FAILURE" }, { "id" : "6582837314653381629", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "6360062324217911350", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "7715476058810588168", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "3482840431922145658", "status" : "COMPLETED_REPORTED_FAILURE" }, { "id" : "9113809683020082982", "status" : "IN_QUEUE" }, { "id" : "539630756425555707", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "5734729574968124955", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "125936729678375237", "status" : "COMPLETED_REPORTED_FAILURE" }, { "id" : "1708963909976380164", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "8093936845399387439", "status" : "COMPLETED_REPORTED_FAILURE" }, { "id" : "5001458330784911287", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "5249883356991082715", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "1188180176706354177", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "6831577933562251033", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "501154555273838537", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "8054300763846559958", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "4692874175755235946", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "7683356194752367062", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "5120739155941491160", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "2404713746178854236", "status" : "COMPLETED_REPORTED_FAILURE" }, { "id" : "4249861121749232247", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "6126017932666044909", "status" : "COMPLETED_REPORTED_FAILURE" }, { "id" : "1162850291240385414", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "7594027679678076171", "status" : "IN_QUEUE" }, { "id" : "7506487929518952099", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "5623485279296573103", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "4601749753903971781", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "2150568387230826830", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "2007162770488699431", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "133783763790707432", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "8345904766810188516", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "7233075587724715219", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "4347474103605451801", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "4245864417406157626", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "369185362761722083", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "7407729203944140936", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "1072493279872233338", "status" : "IN_PROGRESS" }, { "id" : "1277640745589907481", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "2833365979978957222", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "2599323974866358946", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "7087269418540005150", "status" : "COMPLETED_REPORTED_FAILURE" }, { "id" : "971497067916679693", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "5348133927615240015", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "2937088002668922341", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "3603322945866614294", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "8537607674672803305", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "6680309949443148322", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "5413236312102214791", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "2701523302340727438", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "8144705305160379963", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "9128509926643017440", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "8595011812659732714", "status" : "COMPLETED_REPORTED_FAILURE" }, { "id" : "7902113704691717050", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "2752801314046829193", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "5583898548637658512", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "7834497166655569929", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "2048961241349294350", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "4047633280802463110", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "5897465428697712000", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "9170427910763687569", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "8181585165984911505", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "8332691818504517041", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "8541019934800800588", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "407524162350750712", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "371647027544991592", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "3319922032786977495", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "8986107558493264187", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "6941662452291915048", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "6765905770319726516", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "5470012251968416494", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "8199002599636566704", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "203973585446000333", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "5176665968393073850", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "3363931162132109901", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "579326182827542318", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "3005457530172849728", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "3480513703745294522", "status" : "COMPLETED_REPORTED_FAILURE" }, { "id" : "4976487340628329578", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "7887094504988163001", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "10177545831921799", "status" : "IN_QUEUE" }, { "id" : "2682662113140236044", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "3659772913147106688", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "6052029691558505049", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "7543754033757492301", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "5703302035901126539", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "1489961544420643884", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "7017298289658346802", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "7714367848701063525", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "6583884193896814036", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "9117156617127444628", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "8363847171497350644", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "480747272759330166", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "5098570948818459545", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "546239185868720591", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "2675136334786764219", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "5543385199008018188", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "5869760770666739425", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "995569802263229805", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "1079415132640242944", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "6207328580989779833", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "6139923615277855958", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "2381594363416989136", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "8451662317471543359", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "1205624016947690936", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "3521264261367216913", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "687695678852492717", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "7025612545058919439", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "8913278052128898476", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "3845712596184233330", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "1295298173100783868", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "1912792939874183884", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "6454473447660733711", "status" : "COMPLETED_REPORTED_FAILURE" }, { "id" : "4959299377855714889", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "3434470276590831001", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "6057330566487747091", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "5753697633622597623", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "6350629774657201466", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "6830872363367457802", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "1172978832238829257", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "2442598689889947998", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "5652852400788428813", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "2349903805035029803", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "1673796418108029907", "status" : "COMPLETED_REPORTED_FAILURE" }, { "id" : "2728641401559169302", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "4104641916706793409", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "7881760847362550861", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "100063541395999659", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "116715504221666189", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "878497267216979114", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "2454361266762181673", "status" : "COMPLETED_REPORTED_FAILURE" }, { "id" : "1894437843141145228", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "5373066061673565721", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "7617484088420734387", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "4333809301958098617", "status" : "COMPLETED_REPORTED_FAILURE" }, { "id" : "8582274664519762985", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "4224456466840460024", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "7546517268142288580", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "5369817181349747986", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "8500230470911557659", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "1830117596207081376", "status" : "COMPLETED_REPORTED_FAILURE" }, { "id" : "3811893329278209371", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "7548783386305207200", "status" : "COMPLETED_REPORTED_FAILURE" }, { "id" : "2712890071095256380", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "6239917432410271099", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "2392877539847623238", "status" : "COMPLETED_REPORTED_FAILURE" }, { "id" : "8892760540412428925", "status" : "COMPLETED_REPORTED_FAILURE" }, { "id" : "5456753703126888031", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "4808358158728453838", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "304725846482406118", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "3626905468254318973", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "587929405639111175", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "4619470220687264779", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "5105793454830305109", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "1461790979928002298", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "6846933414284238393", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "3501348336100607542", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "2911249974581540316", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "839531927308071905", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "1840752484807969664", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "6413331405813715058", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "1409301078279924523", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "4979020344101122582", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "2913949912263296041", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "6481834518982689907", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "6102844754267508369", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "314417963879600664", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "6409035147791281852", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "4419244334110583918", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "1487070550700558381", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "7056206259016616848", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "1646209603388393776", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "2317275266684727893", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "8728903070892943880", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "7985472256825305441", "status" : "IN_QUEUE" }, { "id" : "7369733797798597178", "status" : "COMPLETED_REPORTED_FAILURE" }, { "id" : "6719973594292175663", "status" : "COMPLETED_REPORTED_SUCCESS" }, { "id" : "8704828646780985697", "status" : "COMPLETED_REPORTED_FAILURE" } ] }
</code>

On the right track? There is probably a way to do the @for@ without using @for@, but I'm still living in the old, Java 1.8 days. :)

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2023-02-03T18:47:28Z


Looks good, just make sure you do any closing in a @finally@ block or use a @try-with-resources@, preferably, assuming the resource is @Closeable@ - also use the code formatting tools before pushing :-)

More importantly, why are there smoke test jobs in there with status @COMPLETED_REPORTED_FAILURE@...?

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-03T19:02:46Z


This was a complete listing of all jobs the persister knows about, so they go back many days. I've posted some failing evaluations in recent days while testing some features.

Thanks for the reminders. I'll make sure to clean up the code before pushing.

I attempted to @adminToken@ protect the end point, but, in so doing, broke it when I tried to have it return a @Response@ instead of just a @String@. I wanted to use a @Response@ in order to use the existing @adminToken@ code, which returns unauthorized responses. However, when I made the change, I saw what I shared below provided below when visiting the listing endpoint and couldn't get the @adminToken@ to work in various attempts. I was hoping to just see an unauthorized error. Obviously, I have more to learn about formulating proper responses.

I'm now attempting a basic @String@ return with exceptions thrown if the @adminToken@ is wrong or not provided.

Hank

==============================================================================

HTTP ERROR 500 jakarta.servlet.ServletException: org.glassfish.jersey.servlet.ServletContainer-f1868c9==org.glassfish.jersey.servlet.ServletContainer@dce24ab7{jsp=null,order=-1,inst=true,async=true,src=EMBEDDED:null,STARTED}
URI:    /job/listing
STATUS: 500
MESSAGE:    jakarta.servlet.ServletException: org.glassfish.jersey.servlet.ServletContainer-f1868c9==org.glassfish.jersey.servlet.ServletContainer@dce24ab7{jsp=null,order=-1,inst=true,async=true,src=EMBEDDED:null,STARTED}
SERVLET:    org.glassfish.jersey.servlet.ServletContainer-f1868c9
CAUSED BY:  jakarta.servlet.ServletException: org.glassfish.jersey.servlet.ServletContainer-f1868c9==org.glassfish.jersey.servlet.ServletContainer@dce24ab7{jsp=null,order=-1,inst=true,async=true,src=EMBEDDED:null,STARTED}
CAUSED BY:  org.glassfish.jersey.server.model.ModelValidationException: Validation of the application resource model has failed during application initialization. [[HINT] Cannot create new registration for component type class org.glassfish.jersey.media.multipart.MultiPartFeature: Existing previous registration found for the type.; source='null', [FATAL] A HTTP GET method, public jakarta.ws.rs.core.Response wres.tasker.WresJob.getJobListing(java.lang.String) throws java.io.IOException, should not consume any form parameter.; source='ResourceMethod{httpMethod=GET, consumedTypes=[], producedTypes=[text/html;charset=utf-8], suspended=false, suspendTimeout=0, suspendTimeoutUnit=MILLISECONDS, invocable=Invocable{handler=ClassBasedMethodHandler{handlerClass=class wres.tasker.WresJob, handlerConstructors=[org.glassfish.jersey.server.model.HandlerConstructor@5d3d784f]}, definitionMethod=public jakarta.ws.rs.core.Response wres.tasker.WresJob.getJobListing(java.lang.String) throws java.io.IOException, parameters=[Parameter [type=class java.lang.String, source=adminToken, defaultValue=]], responseType=class jakarta.ws.rs.core.Response}, nameBindings=[]}']
Caused by:
jakarta.servlet.ServletException: org.glassfish.jersey.servlet.ServletContainer-f1868c9==org.glassfish.jersey.servlet.ServletContainer@dce24ab7{jsp=null,order=-1,inst=true,async=true,src=EMBEDDED:null,STARTED}
    at org.eclipse.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:651)
    at org.eclipse.jetty.servlet.ServletHolder.getServlet(ServletHolder.java:486)
    at org.eclipse.jetty.servlet.ServletHolder.prepare(ServletHolder.java:731)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:524)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:221)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1380)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176)
... SNIP ...
epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-03T19:07:53Z


Oh, I see. A GET cannot accept form parameters. It needs to be a POST to do that.

Do I really need to make it a POST in order for it to accept an adminToken as a parameter? Perhaps I'm generally doing something wrong?

Here is the current method signature:

    @GET
    @Path( "/listing" )
    @Produces( "text/html; charset=utf-8" )
    public String getJobListing(@FormParam( "adminToken" ) @DefaultValue( "" ) String adminToken) throws IOException
    {
</code>

If I make that a @POST, I can get it to return a @Response@, but that seems like overkill. Anyway, I'll give that a shot and see what happens,

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-03T19:14:08Z


I think I'm seeing that GET requests can use parameters, but maybe not form parameters. Honestly, I'm not sure what the difference is. I need to educate myself on this stuff.

I'm now attempting to implement it as a POST.

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-03T19:17:34Z


When I implement it as a POST, I can run this command to get the response I expect:

@curl -v -i -s --cacert dod_root_ca_3_expires_2029-12.pem --data "adminToken=[removed]" https://nwcal-wres-ti.[domain]/job/listing@

However, when I browse to the endpoint, I get "Not Found". Is a browser not capable of POSTing a request?

Anyway, if it works with @curl@, its doing what I want it to do. Let me research GETs a bit and see if I can change it back, because this doesn't really feel like a POSTing situation.

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-03T19:29:17Z


Got it!

I implement as a GET, but change the parameter to a @QueryParam@. The return of the method is a @Response@, and I see what I expect:

.../job/listing -- Returns a bad request, because I did not provide the admin token. .../job/listing?adminToken=rightValue -- Returns the listing of ids and status. Yay! .../job/listing?adminToken=wrongValue -- Returns an unauthorized.

Looking good.

For some reason, the browser does not recognize the returned JSON as being JSON, so it doesn't reformat it the way the WRDS service does. Its kind of ugly. I may try to figure that out on Monday. I'll also clean up the code.

Thanks,

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-03T19:36:36Z


James:

Is this what you mean by a try-with-resources?

        JsonFactory factory = new JsonFactory();
        try (
              StringWriter jsonObjectWriter = new StringWriter();
              JsonGenerator generator = factory.createGenerator( jsonObjectWriter ); )
        {
            generator.useDefaultPrettyPrinter(); // pretty print JSON

            generator.writeStartObject();
            generator.writeFieldName( "jobs" );
            generator.writeStartArray();

            for ( String jobId : jobIds )
            {
                generator.writeStartObject();
                generator.writeFieldName( "id" );
                generator.writeString( jobId );
                generator.writeFieldName( "status" );
                generator.writeString( JOB_RESULTS.getJobState( jobId ).toString() );
                generator.writeEndObject();
            }

            generator.writeEndArray();
            generator.writeEndObject();
            generator.close(); // to close the generator

            String jsonString = jsonObjectWriter.toString();
            return Response.ok( jsonString ).build();
        }
</code>

I confirmed both objects in the try are closeable.

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2023-02-03T19:47:37Z


Yes, but you don't need to @generator.close()@ when doing that, it has the same effect as a @finally@ block.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-03T20:12:16Z


Got it. Change made.

To get the browser to recognize it as JSON, I think I need this content type:

@Content-Type: application/json@

I'm doing a blind test of that to see what happens,

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-03T20:18:00Z


Sweet. All I needed to do was modify the annotations on the method:

    @GET
    @Path( "/listing" )
    @Produces( "application/json; charset=utf-8" )
    public Response getJobListing( @QueryParam( "adminToken" ) @DefaultValue( "" ) String adminToken )

Now, when I visit this,

https://nwcal-wresp-ha-ti.[host]/job/listing?adminToken=[blah]

I see this in the browser:

{
  "jobs" : [ {
    "id" : "7699870392561872735",
    "status" : "COMPLETED_REPORTED_SUCCESS"
  }, {
    "id" : "6044199484071214264",
    "status" : "COMPLETED_REPORTED_SUCCESS"
  }, {
    "id" : "8205038679122035219",
    "status" : "COMPLETED_REPORTED_SUCCESS"
  }, 
...

Its not as clean as some other views I've seen, but its good enough.

Is there any other information that could be useful for an admin to see and that I can pull from the current @JobResults@ metadata map? Let me see,

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-03T20:39:53Z


Here is everything in the @JobMetadata@ class:

    @RId
    private String id;

    private Integer exitCode;

    @RCascade( RCascadeType.ALL )
    private SortedSet<URI> outputs;

    @RCascade( RCascadeType.ALL )
    private ConcurrentMap<Integer,String> stdout;

    @RCascade( RCascadeType.ALL )
    private ConcurrentMap<Integer,String> stderr;

    /** Optional: only set when posting job input via tasker */
    private byte[] jobMessage;

    /** Inputs to be added to the above declaration when posting job input */
    @RCascade( RCascadeType.ALL )
    private List<URI> leftInputs;

    /** Inputs to be added to the above declaration when posting job input */
    @RCascade( RCascadeType.ALL )
    private List<URI> rightInputs;

    /** Inputs to be added to the above declaration when posting job input */
    @RCascade( RCascadeType.ALL )
    private List<URI> baselineInputs;

    //Must ensure this is not null.  Just set it to CREATED on construction.
    private JobState jobState = JobState.CREATED;

    private String databaseName;

    private String databaseHost;

    private String databasePort;
</code>

Everything, above, can be discerned from the log output of the job. There is no particular need to display it here. As for the @name@ field mentioned by Chris in the Description, I do not have easy access to it without parsing the declaration. I'd rather not code that in, and Chris said it was optional anyway.

So what exactly would this endpoint provide if we deploy the change? I mean, this was a good exercise for me to gain more experience with web services, and it only swallowed about 3 hours, so that alone perhaps makes it worth the time. But is anything else gained by implementing this?

Given the original need to track what is happening to a job and display it for the user, I think what the current endpoint provides (example in my previous comment) is enough. The WRES GUI, could, in theory, find a job given its job id, which the GUI tracks, and then immediately tell the user its status. Then again, it could also do that before just by visiting the .../job/[job id]/status endpoint. Not sure that provides much.

What this will allow someone to do is check how many jobs are in-progress and how many are in-queue. Perhaps that could be displayed in the WRES GUI to give a user a feel for how busy the service is. Just note that that information is also available in the broker monitor, so this wouldn't help a sys admin. Also note that we will not be giving users access to this endpoint, because it would expose evaluation job ids to users who did not post those evaluations. We need to guard against that. Thus, if a user wants this information, it would need to be displayed in the WRES GUI somewhere.

This would also allow an admin user to quickly identify the jobs that are "remembered" by the COWRES in the persister and the status of those jobs as stored therein. Essentially, it provides easier access to information that otherwise requires some @redis-cli@ commands in the persister container. That seems like a benefit.

So, to sum it up... Do we want to deploy this to production? I'll leave the on the -ti01 for now until we answer that question. Thanks,

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-03T20:42:53Z


Its taken me about 3 hours to get to this point. So far, I'm well under the estimated time. Cool.

Also, I just realized that I treated this user support ticket as a development ticket. My bad. Let me just move it to development since no users were involved with creating this ticket or are observing it.

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2023-02-03T20:45:02Z


As long as it's intended for us and useful to monitoring the status of the service, I have no objections. Agree with the title "as an integrator". However, it's not a box we should open in the gui or for users in general because it isn't user-facing information - we don't want to expose job handles (remember, an evaluation can be deleted with a handle alone) to anyone other than the caller of a particular job.

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2023-02-03T20:47:57Z


As I noted elsewhere, the best situation for a user is no time spent enqueued (which implies more lanes when that begins to happen) and then a partial completion state for their evaluation job while it's in progress (e.g., % complete). That's the main thing a user wants w/r to job status, they don't want to have to extrapolate from info. about service status overall that is not really intended for them.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-03T20:58:07Z


As sys admin, only the ability to see what is in the persister without using @redis-cli@ is an advantage. Then again, if I just visit the status endpoint, doesn't that tell me what I need to know? Hmmm... I may need to think on this.

For the users, agreed. Displaying an in-progress and in-queue count for users to see will tell them if the service is busy, but won't really tell them how long it will take for them to complete their evaluation if they submit now or how close it is to being done. It also won't tell them where in the queue they are located. The persister keeps its information in an unsorted map, so there is no way to identify the position in a queue without something that reaches into the broker.

As I said, this endpoint is of limited value, beyond its ability to do a listing of whats in redis without using @redis-cli@, and even that is not all that valuable.

Anyway, we can decide on Monday. Have a great weekend!

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-03T21:08:53Z


Apparently, changing the output type to "application/json" results in the browser displaying a list that is cutoff at the end:

  }, {
    "id" : "1646209603388393776",
    "status" : "COMPLETED_REPORTED_SUCCESS"
  }, {
    "id" : "2317275266684727893",
    "status" : "COMPLETED_REPORTED_SUCCESS"
  }, {
    "id" : "

I see it in both the browser and @curl@. I might need to do some other work to the @Response@ to get it to work properly with JSON output. I'll need to figure that out on Monday.

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2023-02-03T21:18:24Z


Unless this solves a problem, I would lean towards not doing it. We don't need to add stuff to the codebase that doesn't have a solid MO.

If the goal is to better monitor the persister, it probably isn't very hard to get integration with Prometheus for a pipe of metrics and then a Grafana dashboard to view it - probably just requires some additions to the persister. It's analogous to monitoring broker traffic, we don't add endpoints to the web service for that, we monitor it directly using broker client tools.

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2023-02-03T21:21:43Z


Quick search suggests yes:

https://dev.to/nelsoncode/how-to-monitor-redis-with-prometheus-and-grafana-docker-3lfb https://github.com/oliver006/redis_exporter

There are probably better things, but they are the first things I found.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-06T14:49:54Z


I think a redis monitor would be useful, particularly if it also allows for modifying contents in the queue so that we can remove problematic jobs from the history. For example, if a job is left IN_PROGRESS and results in queues being created that are never removed. I haven't seen that in a while with changes I've put in place, but it might be good to have the ability to clean up the queue. Let me do a bit of research.

Here is the code I developed to add the end point, since I think it might be informative later. First, I added a method to @JobResults.java@:

    /**
     * Obtain the correlation ids for which jobs are in the map.
     * @return A list of the job ids for which metadata are available.
     */
    Set<String> getAvailableJobIds()
    {
        return new HashSet<String>( jobMetadataById.keySet() );
    }
</code>

Then, in the file @WresJob.java@, I added this method to handle the endpoint (note that the admin token checking code should probably be turned into a method, since its also called elsewhere in the class with slight differences):

    @GET
    @Path( "/listing" )
    @Produces( "application/json; charset=utf-8" )
    public Response getJobListing( @QueryParam( "adminToken" ) @DefaultValue( "" ) String adminToken )
            throws IOException
    {

        //An admin token is required for this command if a hash was created.  
        //If not, then no admin token was provided at start up and its open access.
        if ( adminTokenHash != null )
        {
            if ( adminToken == null || adminToken.isEmpty() )
            {
                String message = "A job listing requires the adminToken, which was not given or was blank.";
                LOGGER.warn( message );
                return WresJob.badRequest( message );
            }

            try
            {
                KeySpec spec = new PBEKeySpec( adminToken.toCharArray(), salt, 65536, 128 );
                SecretKeyFactory factory = SecretKeyFactory.getInstance( "[omitted]" );
                byte[] hash = factory.generateSecret( spec ).getEncoded();

                if ( !Arrays.equals( adminTokenHash, hash ) )
                {
                    String message = "The adminToken provided for the job listing did not match that required. "
                                     + "The operation is not authorized.";
                    LOGGER.warn( message );
                    return WresJob.unauthorized( message );
                }
                LOGGER.info( "For the job listing, the admin token matched expectations. Continuing." );
            }
            catch ( Exception e )
            {
                String message = "Error creating hash of adminToken; this worked before "
                                 + "it should work now. Job listing not authorized. "
                                 + " Contact user support.";
                LOGGER.warn( message, e );
                WresJob.unauthorized( message );
            }
        }

        Set<String> jobIds = JOB_RESULTS.getAvailableJobIds();

        JsonFactory factory = new JsonFactory();

        //The response is getting cutoff mid list.  What if I jsonString set in the try,
        //but then move the Response outside of the try?  Could it be encountering an exception?
        //Is it because of the application/json change? TBD.
        try (
              StringWriter jsonObjectWriter = new StringWriter();
              JsonGenerator generator = factory.createGenerator( jsonObjectWriter ); )
        {
            generator.useDefaultPrettyPrinter(); // pretty print JSON

            generator.writeStartObject();
            generator.writeFieldName( "jobs" );
            generator.writeStartArray();

            for ( String jobId : jobIds )
            {
                generator.writeStartObject();
                generator.writeFieldName( "id" );
                generator.writeString( jobId );
                generator.writeFieldName( "status" );
                generator.writeString( JOB_RESULTS.getJobState( jobId ).toString() );
                generator.writeEndObject();
            }

            generator.writeEndArray();
            generator.writeEndObject();

            String jsonString = jsonObjectWriter.toString();
            return Response.ok( jsonString ).build();
        }
    }
</code>

Note that the code above is broken, because the JSON response is cutoff and is not displayed properly in a browser anyway. To see the full JSON output, you can modify @Produces@ to this:

@Produces( "text/plain; charset=utf-8" )

So, if we ever decide to resurrect this code, perhaps for a different purpose, then I would need to figure out the "application/json" issue.

Thanks,

Hank

EDIT: And be sure to add the methods in an IDE that will automatically figure out the @import@ statements. I did not include those above.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-06T19:00:39Z


I investigated use of RedisInsight, because it is a product created by the Redis developers to support monitoring of Redis, but posted comments in #112054. Too many tickets open at once. Anyway, here is the gist...

I found a @docker@ command that starts RedisInsight on the fly:

docker run -v redisinsight:/db -p 8001:8001 redislabs/redisinsight:latest

You can see that the image is obtained from redislabs. I will probably need to find a "trusted" source in order to use it without risk of it being taken away. When the image is started as above, it cannot connect to the @persister@ redis database.

I then found a means to configure its use in a @docker-compose@ @.yml@ file:

    redisinsight:
        ports:
         - "8001:8001"
        image: redislabs/redisinsight:latest
        volumes:
         - [a directory]/redisinsight:/db
        networks:
            wres_net:

I also added this to the persister @.yml@ configuration, but I'm not sure its necessary given I point RedisInsight to @wres_net@:

    persister:
        ports:
         - '6379:6379'

That will need further testing. With that @.yml@, the RedisInsight was able to see the persister database and interact with it. It provided all of the capabilities I've tested so far, but I have not yet attempted to remove from the queue. I think that is something we might want, but admit that direct editing of the queue is a bit dangerous. Anyway, I'll test that later.

To summarize what I still need to look into:

  1. Test if queue entries can be removed via RedisInsight.

  2. Can RedisInsight be pulled from a "trusted" source?

  3. What is the best way to configure the persister to let RedisInsight talk to it, but not expose it to harm otherwise? Do I need to expose port 6379 via a @ports@ "6379:6379" entry in the @.yml@? I note James's comment #112054-28.

  4. Identify how to secure all endpoints as needed. Certainly, RedisInsight will need to be secured (port 8001). I'll also need to secure the persister 6379 port if testing in Step 2 indicates its needed. The @redis.conf@ should allow for that.

  5. Identify optimal configuration of RedisInsight. For example, can I configure it to have the persister redis db added at start up? Currently, I need to manually add it when I visit the RedisInsight URL.

Lastly, there are other products I should investigate, but, in my mind, the bar will be high for those products to be favored over RedisInsight, largely because, as I said, this is a product written by the Redis folks to monitor their own tool. That's gotta be worth quite a few points, I think. I'll try to do my due diligence to read about those products, but I'll only test those that I believe have a chance to overcome that high bar.

Thanks!

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-06T19:04:04Z


Putting on-hold due to a meeting. May not get back to this until tomorrow.

Moving to 6.12, which is more realistic than 6.11.

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-06T20:17:06Z


Looking at the currently trusted sources according to the draft ITSG container doc:

Iron Bank/DCAR (trusted)
    An image must meet all of the following criteria for usage, all other images will require a software approval process.
    Image Status (one of the following):
        Approved
        Conditionally Approved
        Verified
        Overall Risk Assessment (ORA): 60%+
DOC, NOAA or NWS (trusted)
Red Hat Container Repository (untrusted)
Product vendor proprietary repository (untrusted)
Docker Hub Official Images (untrusted)
Docker Hub Verified Images (untrusted)

I noticed that our @wres-redis@ @Dockerfile@ includes this at the top,

FROM redis:6.2.8-alpine3.17

where as the core WRES includes,

FROM registry.access.redhat.com/ubi8/ubi:8.7-1037

I think @registry.access.redhat.com@ refers to the "Red Hat Container Repository", which is marked as "untrusted" in the ITSG doc, yet is still listed in the Appendix as being a usable source of images. However, the redis image is coming from redis, itself, right? I don't think that will be trusted without some sort of approval. (I'm not really sure I'm understanding the @Dockerfile@ @FROM@ correctly.)

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-06T20:17:49Z


The .yml entry I'm testing for RedisInsight refers to @redislabs@, so, again, probably not trusted/approved.

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-06T20:31:13Z


I looked at Iron Bank, which is the DoD source of "hardened" images, and redis is available through version 7.0.8 if I'm reading it correctly. For 6, I see that 6.2.10 is available conditionally approved and non-compliant, while 6.2.10-alpine is unverified-compliant. I'm not sure what the difference is between those two.

However, my focus is RedisInsight, and that is not available through the DoD Iron Bank. We might need to request special approval if we want to use it.

Looking at @wres-redis@, I note that while Redis does require a configuration, and that is included in the image built during the deployment process, RedisInsight does not require any special configuration, unless I need to do something to add the persister redis database by default. If I don't need to configure it, then do we just use the image, as is, without first pushing it to the registry? I'm assuming we'll need to fix the version (instead of using "latest"), but its not clear to me when we should push to the registry and when we should just pull it from its source as is.

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-06T20:35:52Z


I just modified the .yml so that 6379 is not exposed. I then cycled the containers.

It appears as though RedisInsight "remembers" databases that were added previously. This is in the @redisinsight@ directory I had to point it to. So that's how it persists the database. Okay. I'll need to ensure that directory is created appropriately in @/mnt/wres_share@.

When I attempted to connect to the database, it failed:

Your Redis server is offline but you can still analyze a backup file. Go to Memory Analysis > Analyze Now

So exposing the port is necessary.

Is it possible to expose the port so that only applications running on localhost can see it? If I can do that, I don't think I'll need to secure the port, right? Again, exposing my ignorance of networking.

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-06T20:47:33Z


From this:

https://stackoverflow.com/questions/22100587/docker-expose-a-port-only-to-host

read this:

Also: Your host can also talk to each container normally over its IP. Use docker inspect $ID to get a json dump (beside other stuff) containing the network IP.

I ran @docker inspect [network id]@ and identified the internal IP-address of the persister. When I set up the redis database in RedisInsight using that IP, it was able to connect. Is that IP-address always the same for every container running on the docker network? I don't know, but maybe it can be made static through configuration if necessary. I'll need to look into that.

Regardless, it appears as though I won't need to expose 6379, so I won't need certs for redis. Whew. However, I'll presumably need certs for RedisInsight or some other means of authenticating the user. I'll look into the options RedisInsight provides tomorrow. On hold until then.

Thanks,

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2023-02-06T20:55:58Z


Localhost means the loopback network interface on the host machine. In terms of securing access, the problem isn't that this port is being exposed to users or the wider world or something like that, but to anyone with access to the host machine. Anyway, I'm not sure why we'd need to expose the redis stack on the host in order for redis insight to see it inside the container, something isn't adding up there, but perhaps you can find some info. online about that. Certainly, I can see why the redis stack may need to be exposed more generally, but not in our case. I can also see why a redis instance running on the host would need to be exposed to a redis insight instance running in a container. But in our case, both are running on @wres_net@ inside the container, so I don't get it.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-06T21:03:00Z


Right. I mentioned in a later comment that the @ports@ entry is not needed. RedisInsight can communicate to the persister redis instance by providing it the IP address for the persister on the WRES docker network. So we're good to go there. The only possible bother is if that IP address changes, and that's just an annoyance.

Anyway, more tomorrow. Have a great evening!

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2023-02-06T21:12:51Z


OK, good, thanks.

But now I don't get the last comment. You should not need an IP address, you should be able to use 0.0.0.0, i.e., attention all network interfaces. Wherever you're configuring an IP address, I think you should try that instead.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-07T11:40:01Z


0.0.0.0 does not appear to work. Perhaps there is some additional configuration require to allow RedisInsight to connect to a redis instance using 0.0.0.0.

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-07T11:57:49Z


From the documentation on Redis and RedisInsight:

https://docs.redis.com/latest/ri/faqs/

Specifically:

How can I secure my RedisInsight installation? 
We recommend:

Installing HTTPS
Using an allow list to restrict IP access to RedisInsight
Installing RedisInsight within your VPN/internal network.
Additionally, make sure you use a strong admin password for your database.

We need to password secure the Redis database and then install RedisInsight using HTTPS with an allow list to restrict IP access. Something like that.

I need to step away for a bit. When I return, I'll probably created the ticket to deploy 6.11. RedisInsight will need to handled after that.

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2023-02-07T12:17:40Z


Hank wrote:

0.0.0.0 does not appear to work. Perhaps there is some additional configuration require to allow RedisInsight to connect to a redis instance using 0.0.0.0.

Hank

It is hard to debug this without seeing your full configuration. Where is this configuration and what precisely does it look like?

I think this:

by providing it the IP address for the persister on the WRES docker network

Should not mean in IP address, rather something like this, because the persister container is on @wres_net@:

redis://persister:6379
epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-07T12:50:08Z


That worked! I set the IP/URL to persister with port 6379 and it found it. Cool. Thanks for that tip!

I'm going to need to decide if this is worth installing as a service next to the persister on the deployment machines. For HTTPS (see my previous comment), I'll need a new set of certs, and then we have to worry about ensuring no one but an admin user, like myself, can visit the persister database (can be accomplished by password protecting the database, I believe). Preferably, we would lock RedisInsight, itself, to only admin users, but I'm not sure how to do that.

Is this extra configuration and maintenance worth the benefit provided by RedisInsight over using, say, @redis-cli@ directly? I wonder if the cost-benefit analysis is why Jesse never bothered to set up a Redis monitor before.

One more oddity of note (and being able to readily spot this might be some justification for RedisInsight)...

If I browse the keys for which Redis information is available, I see only 5 keys ending *JobMetadata, one per smoke test evaluation I've run in the past couple of days. Looking at each key, I can see the metadata stored: stdout, jobState, exitCode, rightInputs (empty list), databaseHost, databaseName, leftInputs (empty list), baselineInputs (empty list), stderr. This is actually where RedisInsight shines: its really easy to see that information. Though one could debate whether its important to be able to see that information.

Getting back to the point... I see 433 keys ending in *JobMetadata:stdout. Looking at one of the keys, it refers to ASEN6HUD. I haven't run an HEFS evaluation in development in months, I think. The same is true for stderr and outputs: there are many of them.

What does this mean? While the @JobMetadata@ entries in the redis-backed map are getting purged (i.e., the WRES is "forgetting" about the old jobs), the associated stdout, stderr, and outputs are not getting purged. Or at least they aren't getting purged at the same time. This would seemingly imply that we could make the redis database smaller if we ensured that stdout, stderr, and outputs associated with a job were purged when the associated @JobMetadata@ was purged. However, I'm not sure what would be involved with that. I also don't think this is really impacting things in staging or production, except perhaps leading to an .aof file that is larger than it needs to be. Thus, if we were to try to address this, it would be relatively low priority.

I'm really good at identifying work that probably isn't necessary. :)

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2023-02-07T13:01:56Z


I don't think you're exposing the persister beyond what is exposed through the monitoring tool, so the only thing that should need to be secured is the monitoring tool. The database is internal to the persister, which is internal to the container composition. We are not exposing the database, right? I think you removed that port mapping...

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2023-02-07T13:06:06Z


And the way you secure the monitoring tool is have mutual TLS and to provide a client cert to only those admins that should have one (plus any login credentials you want to impose on top of that). It's the same pattern as the brokers.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-07T13:18:44Z


James wrote:

I don't think you're exposing the persister beyond what is exposed through the monitoring tool, so the only thing that should need to be secured is the monitoring tool. The database is internal to the persister, which is internal to the container composition. We are not exposing the database, right? I think you removed that port mapping...

Right on all points. The persister only needs a username/password authentication if we are unable ensure that only admin users use RedisInsight.

And the way you secure the monitoring tool is have mutual TLS and to provide a client cert to only those admins that should have one (plus any login credentials you want to impose on top of that). It's the same pattern as the brokers.

And this is where I can't find a good example through web searching, but maybe the term "mutual TLS" is something I can add to my search.

From what I can tell, folks almost always install RedisInsight on the machine with redis and browse to it via http://localhost:8001. So they don't have to be worried about others going in and mucking around with their database. Unfortunately, our deployment platforms don't have browsers, so I need to open up RedisInsight to the outside (i.e., my laptop), and that introduces the risk. Since mutual authentication is unnecessary most of the time, its hard to find examples of how to do it for RedisInsight online.

Or does mutual TLS set up depend on the tool? Can I apply what you did to for the eventbroker to other tools, like this one?

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2023-02-07T13:25:26Z


Ordinary or one-way TLS is where the client authenticates the server. Mutual or two-way TLS is where the server additionally authenticates the client.

In this context, the server is the monitoring tool server and the client is the incoming connection (e.g., from a browser) to the monitoring tool server. The monitoring tool itself speaks to the persister and there should be no need to secure that because that traffic is internal to the container network. So what you'd be adding with two-way TLS is the ability of the server (persister monitoring tool) to authenticate the client (the browser connection or whatever), as well as the browser connecting client authenticating that the monitoring server is who it says it is.

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2023-02-07T13:32:09Z


Remember how you needed to import the eventsbroker monitor client cert into your browser, as well as the CA cert, in order to connect to the eventsbroker? It's basically the same idea... the client cert is needed for the eventsbroker monitor tool to authenticate your browser connection as a legitimate eventsbroker admin connection.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-07T14:00:01Z


Yeah. I understand that. What I'm trying to figure out is how did you make it so that the broker monitor requires mutual TLS. Is that in the configuration for the broker monitor/events broker? I'm looking at the wres-eventsbroker now to figure out how you set it up,

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-07T14:02:22Z


Looks like its configured in @bootstrap.xml@ and @login.config@. Let me see if I can find instructions for setting up mutual TLS for access to RedisInsight. So far, I've had no luck,

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2023-02-07T14:04:22Z


Oh, that will vary with every app.

You'll need to look into how to do this for redis insight, but there should be some TLS configuration, which will include things like paths to certs, TLS versions supported and cipher suites.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-07T14:16:07Z


Redis does allow for setting up TLS, but RedisInsight does not appear to include it. Sigh. Here are all the configuration options:

https://docs.redis.com/latest/ri/installing/configurations/

Anyway, this capability was always going to be hard to justify; limited options for TLS will make it harder.

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2023-02-07T14:27:24Z


Probably just needs some more reading.

If you need to secure the redis instance itself, that may have the same effect, since you wouldn't be able to access any information about the redis instance (I assume) until that connection has been secured, but it might mean that you can see the gui without any services etc. or something like that. You'd need to experiment. Still, it would be a bit weird to have access granted to a gui with (presumably) no feeds, I don't know what that would look like.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-02-07T14:35:13Z


Right. I can definitely secure the database. It would add a bit of configuration overhead, but that is doable.

I just don't want anyone but WRES administrators visiting RedisInsight. In theory, if we host it on the -prod02, for example, other folks could access it and point it to other Redis instances in OWP even if they don't have access to ours. Our hosted RedisInsight would need to be able to "see" those instances, but, in theory, it would be possible.

Hank