AgileVentures / MetPlus_tracker

Git Repository for the Waffle issue in MetPlus project
2 stars 4 forks source link

can't Match Job Seekers using the cruncher- its not active #728

Open tansaku opened 5 years ago

tansaku commented 5 years ago

Chet said:

can't Match Job Seekers using the cruncher- its not active

Joao asked:

Are you saying that it is not active what are you trying to do?

Chet replied:

If I try to match a individual job seeker nothing happens. When I try to match a job against all job seekers a blue screen and error message appears

Joao asked:

I think I am able to reproduce this behavior in staging we will take a look. Nevertheless I have a question. Do you know for sure if the Job Seeker that you are trying to match to any job will be matched?

Chet replied:

Yes I know some of the JS will be a match to the job posting. If you want I could share my screen with you while going through the process.

Joao asked:

Can you let us know the name of the Job Seeker and the Job Name that you are expect to match?

Chet replied:

Lindsey Webb - Family / Pediatric Practice Nurse Practitioner

tansaku commented 5 years ago

note that I get the blue screen and error when trying to create a new account on staging - could this be related to some connection between the rails and cruncher apps?

tansaku commented 5 years ago

on staging at attempt to match job seekers generates the following error:

tansaku commented 5 years ago

looking at the code on line 291:

  def match_job_seekers
    authorize @job
    Pusher.trigger('pusher_control', <-- line 291
                   'spinner_start',
                   user_id: pets_user.user.id,
                   target: '.table.table-bordered')

    # Get job match scores for all job Seekers
    result = ResumeCruncher.match_resumes(@job.id)

it looks like this pusher operation is failing before we even get to contacting the cruncher

tansaku commented 5 years ago

working with Chet, the issue was occurring when clicking "match all job seekers" on an individual job - it's now getting further along and we are seeing things like this:

tansaku commented 5 years ago

Chet says this should return multiple matches that we can click through on - but this "not available" is weird - it does appear to be working for "match your job seekers against job" when individual job seekers are selected

tansaku commented 5 years ago

seeing this behaviour locally:

similar to that on production - is this a zero match?

tansaku commented 5 years ago

locally we get this back from the cruncher:

{"resultCode"=>"SUCCESS", "message"=>"Success", "stars"=>{"NaiveBayes"=>0.0, "ExpressionCruncher"=>0.0}}
tansaku commented 5 years ago

added some debugging:

joaopapereira commented 5 years ago

@tansaku looks like that in your last test you are Matching a specific Job Seeker against a Specific Job, and the result is that the Job Seeker is a 0 star match to the job

tansaku commented 5 years ago

thanks @joaopapereira - so as we discussed it seems like at the moment everyone gets zero star matches as far as we can tell, but as we looked at together you were saying that the problem was with the presence or absence of particular job categories.

We looked at the staging and production crunchers and saw that the settings look okay from:

db.settings.find()

we also looked at the jobs

db.job.find()

and saw data like the following:

{ "_id" : ObjectId("5bfd88c246e0fb000b646d37"), "_class" : "org.metplus.curriculum.database.domain.Job", "title" : "Family / Pediatric Practice Nurse Practitioner ", "jobId" : "3", "description" : "to provide primary health care services, including health promotion, disease prevention, and interdisciplinary collaboration.   Seeking licensed, professional expertise in history taking, physical examinations, immunizations, non-invasive diagnostic tests, and pharmacology, within a behavioral health agency.  May include home visitation. ", "titleMetaData" : { "_id" : null, "metaData" : { "NaiveBayes" : { "_id" : null, "bestMatchCategory" : "secreatary", "totalProbability" : 0.000552928599063307, "fields" : { "sales manager" : { "data" : 0.000125838938402012 }, "administrative" : { "data" : 0.0000038133011912577786 }, "line cook" : { "data" : 0.00022651006293017417 }, "administrative assistent" : { "data" : 0.0001715985417831689 }, "software developer" : { "data" : 0.0004474273300729692 }, "secreatary" : { "data" : 0.000552928599063307 } }, "_class" : "org.metplus.curriculum.cruncher.naivebayes.NaiveBayesMetaData" }, "ExpressionCruncher" : { "_id" : null, "mostReferedExpression" : "practice", "fields" : { "practice" : { "data" : 1 }, "practitioner" : { "data" : 1 }, "nurse" : { "data" : 1 }, "family" : { "data" : 1 }, "/" : { "data" : 1 }, "pediatric" : { "data" : 1 } }, "_class" : "org.metplus.curriculum.cruncher.expressionCruncher.ExpressionCruncherMetaData" } } }, "descriptionMetaData" : { "_id" : null, "metaData" : { "NaiveBayes" : { "_id" : null, "bestMatchCategory" : "sales manager", "totalProbability" : 7.810418738521035e-19, "fields" : { "sales manager" : { "data" : 7.810418738521035e-19 }, "administrative" : { "data" : 1.3730280890768393e-30 }, "line cook" : { "data" : 5.847479126716874e-28 }, "administrative assistent" : { "data" : 2.2606622988101377e-22 }, "software developer" : { "data" : 4.0588718682211377e-19 }, "secreatary" : { "data" : 1.7454550937515912e-27 } }, "_class" : "org.metplus.curriculum.cruncher.naivebayes.NaiveBayesMetaData" }, "ExpressionCruncher" : { "_id" : null, "mostReferedExpression" : "health", "fields" : { "taking," : { "data" : 1 }, "visitation" : { "data" : 1 }, "expertise" : { "data" : 1 }, "professional" : { "data" : 1 }, "non-invasive" : { "data" : 1 }, "behavioral" : { "data" : 1 }, "prevention," : { "data" : 1 }, "diagnostic" : { "data" : 1 }, "physical" : { "data" : 1 }, "tests," : { "data" : 1 }, "immunizations," : { "data" : 1 }, "include" : { "data" : 1 }, "services," : { "data" : 1 }, "including" : { "data" : 1 }, "disease" : { "data" : 1 }, "agency" : { "data" : 1 }, "may" : { "data" : 1 }, "in" : { "data" : 1 }, "within" : { "data" : 1 }, "pharmacology," : { "data" : 1 }, "health" : { "data" : 3 }, "licensed," : { "data" : 1 }, "promotion," : { "data" : 1 }, "history" : { "data" : 1 }, "examinations," : { "data" : 1 }, "seeking" : { "data" : 1 }, "interdisciplinary" : { "data" : 1 }, "home" : { "data" : 1 }, "provide" : { "data" : 1 }, "collaboration" : { "data" : 1 }, "to" : { "data" : 1 }, "primary" : { "data" : 1 }, "care" : { "data" : 1 } }, "_class" : "org.metplus.curriculum.cruncher.expressionCruncher.ExpressionCruncherMetaData" } } } }

where the categories of jobs are things like:

"secreatary"
"sales manager"
"administrative"
"line cook"
"software developer" 

and that resumes are put into categories - so if there isn't a good category for a resume then it won't get matched at all.

I was just looking at the resumes in the db. There's one that contains the term "pediatric", but it's categorised as follows:

{ "_id" : ObjectId("5beedae546e0fb000b646d1a"), "_class" : "org.metplus.curriculum.database.domain.Resume", "filename" : "Nursing resume NB.docx", "fileType" : "docx", "userId" : "15", "metaData" : { "NaiveBayes" : { "_id" : null, "bestMatchCategory" : "administrative assistent", "totalProbability" : 0, "fields" : {  }, "_class" : "org.metplus.curriculum.cruncher.naivebayes.NaiveBayesMetaData" }, "ExpressionCruncher" : { "_id" : null, "mostReferedExpression" : "·", "fields" : { "epic," : { "

i.e. an administrative assistent with zero probability so I guess it would never get any more than a zero match ...

would it be a good idea to have a back up matching that just matched common words rather than relying on categories?

joaopapereira commented 5 years ago

From what we just talked on the hangout I believe 2 problems can be happening:

  1. The cruncher brain information is missing
  2. The Resume or the Job was not crunched
  3. the categories do not match

How to check option 1:

How to check option 2:

How to check option 3:

Note: The fact that the resume or job does not 100% match a category does not mean that it might not have a percentage of probability of being part of one of the categories but lets image the case: A Nurse might have a higher probability of being an administrative than of being a line cook, but the reverse is also possible. Meanwhile a job description for a Nurse might have an higher probability of being matched with Sales Manager, this way the probability of the Resume match the Job is much smaller or even 0 depending on the probabilities.

Solve problem 1:

Solve problem 2:

Solve problem 3:

NOTE: The fact that one word or another exist in both Resume and Job information does not mean that they will even be a 1 ⭐️ match. This is all a statistic analysis based on the information that we first provided to the cruncher(Brain).

tansaku commented 5 years ago

thanks @joaopapereira - that's very helpful

on staging and production we can see all the brain information.

I can't see it locally, but the technique you describe (option 1) is not causing it to be pulled in when we restart the app after deleting the settings. @sherspock and I were examining the code in NaiveBayesCruncher.java

    private void load() throws CruncherSettingsNotFound {
        LOG.info("Loading settings");
        cruncherImpl.resetMemory();
        try {
            CruncherSettings settings;
            try {
                LOG.info("Get settings, I mean really!!!!");
                settings = repository.findAll().iterator().next().getCruncherSettings(CruncherImpl.CRUNCHER_NAME);
                LOG.info("Got settings");
            } catch(NoSuchElementException e) {
                LOG.warn("Could not find cruncher");
                settings = new CruncherSettings(CruncherImpl.CRUNCHER_NAME);
                Settings globalSettings = repository.findAll().iterator().next();
                settings.addSetting(new Setting<>(LEARN_DATABASE, learnDatabase));
                settings.addSetting(new Setting<>(CLEAN_EXPRESSIONS, cleanExpressions));
                globalSettings.addCruncherSettings(CruncherImpl.CRUNCHER_NAME, settings);
                repository.save(globalSettings);
                LOG.info("saved global settings");

            }
            LOG.info("Database settings: " + settings);
            LOG.info("Local settings learn database: " + learnDatabase);
            LOG.info("Local settings clean expressions: " + cleanExpressions);

and trying to work out how to force a reload of all the resume data, and came up with this (although it's not quite doing what we want as it deletes the entire cruncherSettings element):

> db.settings.update({ _id: ObjectId("5c2f64bbb5615fc80189651f") }, { $unset : { "cruncherSettings" : { "NaiveBayes" : 1} }})
> db.settings.find()
{ "_id" : ObjectId("5c2f64bbb5615fc80189651f"), "_class" : "settings", "CRUNCHER_SETTINGS_NAME" : "CRUNCHER_SETTINGS_NAME", "appSettings" : { "_id" : null, "settings" : { "test" : { "name" : "test", "data" : "haha" } }, "mandatory" : [ "test" ] } }

then when restarting we saw all the resume data being pulled in on the console:

2019-01-04 14:02:31.713  INFO 51716 --- [           main] o.m.c.c.naivebayes.NaiveBayesCruncher    : Get settings, I mean really!!!!
2019-01-04 14:02:31.726  INFO 51716 --- [           main] o.m.c.c.naivebayes.NaiveBayesCruncher    : Got settings
2019-01-04 14:02:31.726  INFO 51716 --- [           main] o.m.c.c.naivebayes.NaiveBayesCruncher    : Database settings: SettingsList: {settings: {LearnDatabase: org.metplus.curriculum.database.domain.Setting@7a4d582c,Name: org.metplus.curriculum.database.domain.Setting@5626d18c,}, mandatory: [Name,]
2019-01-04 14:02:31.728  INFO 51716 --- [           main] o.m.c.c.naivebayes.NaiveBayesCruncher    : Local settings learn database: {software developer=[Sr. Angular UI Developer                       Developer : Experience with streaming aps, experience with trading applications very helpful.Description:The Active Trader Client Applications team is responsible for the Active Trader StreetSmart family of products. As part of our continuous investment in the StreetSmart pla

but we still don't see it in the mongodb when we run db.settings.find, which persists in displaying the following:

[tansaku@Samuels-MBP:~/Documents/Github/AgileVentures/resumeCruncher (master)]$ 
→ mongo
MongoDB shell version: 3.2.9
connecting to: test
> use resumeCruncher
switched to db resumeCruncher
> db.settings.find()
{ "_id" : ObjectId("5c2f64bbb5615fc80189651f"), "_class" : "settings", "CRUNCHER_SETTINGS_NAME" : "CRUNCHER_SETTINGS_NAME", "cruncherSettings" : { "bamm" : { "_id" : null, "NAME_SETTING" : "Name", "settings" : { "Name" : { "name" : "Name", "data" : "New cruncher" } }, "mandatory" : [ "Name" ] }, "NaiveBayes" : { "_id" : null, "NAME_SETTING" : "Name", "settings" : { "LearnDatabase" : { "name" : "LearnDatabase" }, "Name" : { "name" : "Name", "data" : "NaiveBayes" } }, "mandatory" : [ "Name" ] }, "ExpressionCruncher" : { "_id" : null, "NAME_SETTING" : "Name", "settings" : { "CaseSensitive" : { "name" : "CaseSensitive", "data" : false }, "IgnoreList" : { "name" : "IgnoreList", "data" : [ "a", "or", "and", "then", "must", "least", "i", "am", "of", "but", "our", "mine", "very", "worked", "decided", "each", "an", "as", "at", "on" ] }, "IgnoreListWordSearch" : { "name" : "IgnoreListWordSearch", "data" : true }, "Name" : { "name" : "Name", "data" : "ExpressionCruncher" }, "MergeList" : { "name" : "MergeList", "data" : { "cook" : [ "cook", "line cook" ], "software@@@@@development" : [ "software development", "software development lifecycle" ] } } }, "mandatory" : [ "Name" ] } }, "appSettings" : { "_id" : null, "settings" : { "test" : { "name" : "test", "data" : "haha" } }, "mandatory" : [ "test" ] } }

so we're unclear if we've got our local develop system in a way to accurately diagnose the bug.

In production at least we think the resume and job have been crunched in that they appear in the mongodb.

If we want to add additional categories how do we do that?

tansaku commented 5 years ago

just updating, that if we want to precisely remove the naiveBayes element from the settings we can do that with:

db.settings.update({ _id: ObjectId("5c2f64bbb5615fc80189651f") }, { $unset : { "cruncherSettings.NaiveBayes" : 1} })

however re-starting the cruncher after doing this did not lead to any resume being shown in the main output log, and also nothing in the mongodb, which was back to having NaiveBayes like this:

        "NaiveBayes" : {
            "_id" : null,
            "NAME_SETTING" : "Name",
            "settings" : {
                "LearnDatabase" : {
                    "name" : "LearnDatabase"
                },
                "Name" : {
                    "name" : "Name",
                    "data" : "NaiveBayes"
                }
            },
            "mandatory" : [
                "Name"
            ]
        },

although restarting again they were shown being loaded in on the main console output, but again, nothing in the mongodb itself ...

joaopapereira commented 5 years ago

@tansaku that is strange.... There was a bug that was solved maybe 4 weeks ago that was not storing/reading information from the mongo database do you have the latest commit? 219cc009a

Nevertheless the behavior is strange I prefer to just remove the full settings because it ensures a clean slate and the rest of the info have no way to be changed for now.

To add new categories we need to have a batch of Resumes and Jobs that match a specific category and then add them to https://github.com/AgileVentures/MetPlus_resumeCruncher/blob/development/app/src/main/resources/application.yml#L45 The files need to be converted into a string and new line converted into \n

the yaml file looks like this:

naive-bayes:
  learn-database:
    "new category that we want to add":
      - "This is the first resume \n as \n a string"
      - "This is a job description that we have\n for this new category"
.....
tansaku commented 5 years ago

@joaopapereira understood - but does that mean adding people's potentially private resumes to a public git repository?

and I have updated to the latest cruncher and the correct data is now showing up in the mongodb locally ...

What would be great would be if the seed data had at least one user with resume that matched at least one job so we could see the possibility of it working locally ... maybe one of the existing jobs does match tom seeker?

joaopapereira commented 5 years ago

@tansaku the ones we have there were picked up from examples on the web and some heavily redacted ones that Chet made available

tansaku commented 5 years ago

right @joaopapereira but for the new pediatric nurse ones in the production system they are real ones that chet is uploading, so if we wanted to use those we'd have to get him to redact, or approve our redacted versions of them, if we wanted to add them into application.yml, no?

joaopapereira commented 5 years ago

yes @tansaku. Eventually when we are in a world where we are more stable we can feed this information into the database and no longer use the application.yml