Open AdminAtAptly opened 6 years ago
@Traizen after thinking it over a bit, before you begin any work on this one, or future projects, you might get in habit of forking the repo on Github first, then have your working branches own forked repository. This does a few things
When you get code to a place that you are ready to push code into production then you can go back to your fork on github and do a pull request back to dev, which will basically send a request to me or Joe to merge your changes back into our production workflow and we will either accept and merge or changes in, or make suggestions back to you necessary before we can merge.
I was spot checking JSON records last night and stumbled across a field on the job record, called "source". I think this might be useful for this issue.
At first blush, it might be the key to distinguishing legitimate records for each of the aptly-ho.cfg. All the records I noticed as bad(e.g. "Mercantil Commerce Bank") had either "Indeed" or something other than the aptly-ho.cfg entry. If true, that could enable resolution through a simple modification to the Indeed API call from jobs.sh instead of a new script and an additional pass.
Although there is little impact to an additional pass right now, at scale it could become heavy so if a separate pass can be avoided it is probably a good thing.
I'm a little bit lost, but I'm a little closer, I just don't understand the config entries, where can I find the config files aptly-ho.cfg and aptly-st.cfg?
We are having an issue with a greedy search against the Indeed API. I suspect this issue applies to all extracts, but it's most glaring in the extracts created by the "jenkins-jobs-by-company".
Although the search params include the specific params used by their own Internal apps for delivering jobs to their company sites, it seems other companies land in their results. Looks like they are using a wildcards with a full text search.
Although there may be an easy hack to the API to avoid this condition, I think for now we just need a standalone job that can be appended to each of the Jenkins jobs, that takes a pass against all ./archive/*.json files and assures that all jobs have a "company" value that is in the aptly-ho.cfg, and a "state" value in the aptly-st.cfg. If it passes both of these tests keep it, if not lose it.
A couple of recurring failures have been found -- "Mercantil Commerce Bank" and "The Commerce Bank", both would fail both of these tests.
Worth noting: all filenames at the completion of the run must end up the same as they are now, since the UI is driven by the specific filename scheme. Also, we need all temp files cleaned up to avoid production cruft.
Feel free to commit back to the repo any test artifacts that can be used for future unit testing and I will wire them up to our testing tools, but unit testing is not a required here.