Closed cesquib closed 5 years ago
is it rational to have a separate script for a "ChangeUser" and another for a "Newuser"? because after viewing the website, I realized there are two different tabs for this two activities, so i'm thinking of having a separate CSV for each request.
I always like to keep it in a single script. It's easier to manage and you're not maintaining two or more scripts.
But then again you mentioned something about the script running as a task so how are you pulling the data and how can you differentiate from a new user id a change user? Is it in the file as a column or is it part of the file name?
about how I'll pull the data, if I can get this generated into a CSV and saved at a specified location say "C:\" and let the task scheduler ran the appropriate script at a specific time interval, this is what I have in mind :
Register-ScheduledJob -Name 'CSV-check' -FilePath
'C:\CreateUser.ps1'
-Trigger (New-JobTrigger -once -At "02/05/2019 15:35" -RepetitionInterval (New-TimeSpan -Minutes 59) -RepetitionDuration ([TimeSpan]::MaxValue))
You're checking for a new "CreateUser.ps1" file every hour, is that correct? What happens if the website (or whatever service) sends you two new user files before the next task run? Say one file is delivered at 1:03pm and another at 1:30pm - would the original file be overwritten? Would there now be two files? If the latter, how would you check for the other file? If the former, you're now missing data.
Maybe a File System Watcher class would be good for this use case -> https://mcpmag.com/articles/2015/09/24/changes-to-a-folder-using-powershell.aspx
You still need to define how you're going to differentiate between a "change user" and "new user" request.
that's a very interesting point that you've raised. I discussed this with my boss and he says there'll be a "start-date" in the csv, and every user will be created 48 hours in advance to the start date entered, so if the script runs every hour and encounters the two new files delivered within the hour, the difference will be the start date and if co-incidentally the two start dates are the same, do you think it'll be a problem for the script ? what are your thoughts on this? Also about differentiating between change user and new user, perhaps the headers in the csv or the path where the csv will be stored? I've asked for a sample csv that will be generated automatically but still haven't gotten any, hence the difficulty in being specific about that.
I discussed this with my boss and he says there'll be a "start-date" in the csv, and every user will be created 48 hours in advance to the start date entered, so if the script runs every hour and encounters the two new files delivered within the hour, the difference will be the start date
I don't think you're understanding the issue properly, let me try to better explain.... I'm assuming the web site is delivering a CSV file every time a request is made on the site, either for a change user request or for a new user request. I'm also going under the assumption that these file names will be static and delivered to some FTP server or some SMB share where your script can process them. If I'm correct and someone does a 'new user' form at 1:01pm the web site will deliver 'new-user.csv'. Then, if someone goes in and requests yet another (different) user at 1:05pm you will get a new 'new-user.csv' file overwriting the one delivered at 1:01pm. When your script runs at the next 1 hour interval it'll only pickup the latest file that was delivered and you'll miss out on previously delivered data because of the overwrites. I'm making assumptions here but you should definitely verify what naming convention will be used when the web site delivers these csv files or you will be missing requests.
there'll be a "start-date" in the csv, and every user will be created 48 hours in advance to the start date entered, so if the script runs every hour and encounters the two new files delivered within the hour, the difference will be the start date and if co-incidentally the two start dates are the same, do you think it'll be a problem for the script ?
This goes to 'collision rules' and you'll need to map this out and really think this through. It doesn't matter what 'start date' shows, it really matters what is shown as the name and request type. At this moment, your new web site is becoming the 'system of record' for all new users and you have no way (as I see it) to differentiate between and actual new user OR a possible duplicate request. If MyCo requests a new user "John Doe" at 1:05pm and your script processes it at 2:00pm (every hour like you want) how is the site confirming that a new request from MyCo at 2:05pm for another "John Doe" is a legitimate new user? Maybe there are two people from MyCo that are responsible for requesting new users and they aren't properly communicating who has done what? Your script can create 'collision rules' for this type of thing (If John Doe exists, then create John Doe 2), however, you still have no way to verify that the request from MyCo isn't for the same John Doe (john doe 2 doesn't actually exist, someone just requested the same user twice).
Also about differentiating between change user and new user, perhaps the headers in the csv or the path where the csv will be stored?
It doesn't matter - there are many ways to accomplish this and you just need to figure out what the site is going to provide.
I understand what you mean, I'm still thinking of alternatives to fix all those eventualities, if you have any ideas I will really be glad to implement them. because at the moment anything I try comes up with one error or the other. my boss has asked me to for now just focus on the "NewUser" script with a startdate. once that is set, we'll find an alternative for the change user, maybe that will be done manually by the helpdesk
I would honestly put my focus on the website and define possible issues as that site being the 'system of record' and how to accomplish what you need within the site. The system of record should be the one validating requests and verifying the data, not the receiving party (your script/AD).
I've recently gone through the same thing with a client where they wanted a new HR system to be the system of record, not AD, and we had to make sure the new HR platform validated the data prior to sending it to me for consumption into AD. You can script all day long but in the end if you're not validating the data it'll cause more harm than good and the project would like make more work vs manually entering these items.
In our case, the solution to the collision rules was to have the HR system send us the system assigned serial number of the user. That serial was included in teh CSV files the HR system sent to the script and we updated the "employeeNumber" in AD with that information. That way, when a 'change user' came through, modifications on the HR system would be done at the ID level and when the 'change user' request came through I know which user to search for because I had a 'primary key' that was unique - in this case the serial number/emnployeeNumber. I didn't have to worry about there possibly being more than 1 version of a "John Smith" user object in AD.
If it all possible you should request to work with the site designers to figure this out as well as they could have some input. In a perfect world this is what I would request...
When a 'new user' request is submitted via the website the site stores the information in a DB table and creates a unique ID ('primary key') for that user in the DB. The site then makes a CSV file of the data (as it exists in the DB) and sends it off for processing. The CSV file would include all necessary information for the user (name, email, whatever) and the 'primary key'.
In the 'new user' form, type ahead functionality for names should exist that does a lookup to the 'users' table in the DB (the one mentioned above) and then if a full match is found it asks the user if this is really a new user or a change request. If it is a new user (possibly someone with the same name) then the user must select a checkbox that validates this is a new user, otherwise they are sent to the "change user" form with the information pre-populated.
When a 'change user' request is submitted via the website the website should have type-ahead functionality for user names that are already stored so if someone types "John Doe" and multiple exist the requestor is able to select the proper user they want to modify. Once the modification is submitted to the site the site takes the information from the DB (as updated by the change request) and sends it to your script for processing. The CSV file would contain all necessary updated information and the 'primary key'.
The above process would alleviate many of your errors in your script and general processing errors due to collisions or the multitude of other 'fail' scenarios that currently exist.
If the above solution is not possible then you can only design your script to add new users if one doesn't already match or change a user only if a single result is returned when querying AD for which user to modify. Else, you will have lots of problems in the future with duplicate user accounts or modifying the wrong user. And if you want to continue in this manner the only thing we need to know is how to differentiate between a new user and a change user request. It's a simple ask to the website devs to understand what you will be receiving when these actions are taken on the site. Once we have this information we can move forward with the script.
this is very informative, in fact it's pretty much the kind of project we are workin on. I copied and pasted your message to my boss and he's going to talk to the website guys so we have a similar system like you did with the HR, our website here is scheduled to go online in the next two weeks so it's a race against the clock now. Using a primary key is indeed the best unique identifier to ensure that we don't have collisions/duplicates. Regarding how to differentiate a 'New User' (NU) from a 'Change User' (CU) request, the website developers say the name of the CSV will either be NU(NewUser Request) or CU(change User Request), so based on that, the script should know what action to take.
Great - sounds like you're making progress...
...the website developers say the name of the CSV will either be NU(NewUser Request) or CU(change User Request)
So will they be the same name every time (cu.csv and nu.csv, respectively) it's delivered? If so, you'll want to re-think your task scheduling every 1 hour or, since you're already making requests have the the site send over a file with a creation timestamp included in the filename (e.g.: cu_05092019102320) (102320 being hours:minutes:seconds). That way, you now also have a unique data file and don't risk the possibility of overwriting the file (cu.csv) when someone makes yet another request before your script is scheduled to run. You may also want to think about adding milliseconds to the filename depending your usage metrics of the site. How plausible is it that two users could press 'submit' on your site at the same time for the same type of request?
In my client's script I requested basically the same thing. The HR system would deliver a CSV file via SFTP that included a timestamp in the filename, my script would run and process the file. Once the file was processed it would be made read-only and I appended a .done extension. This setup allowed me to filter out CSV files that were already processed and not risk the chance of possibly re-processing the same file again. It also provided me two filters (read-only attribute and .done extension) just in case one failed (script couldn't set read-only flag or couldn't rename the file for some reason)I had a backup filter for subsequent runs of the script. In the script I basically did
Get-ChildItems -filter *.csv where $_.IsReadOnly = $False
(not actual code but you get the idea).
Do you know all the CSV headers now?
The CSV will not always be the same, it will look something like this - "NU.johndoe.csv" or "CU.samsmith.csv . Adding a time stamp will definitely help solve the problem of conflicting a simultaneously submitted request though the chances of that happing at the very same time is slim. All the client companies have unique log in credentials so if two separate client companies both submit a request for NU.maryjane.csv, the millisecond should separate the two request
I like the idea of adding the extensions to filter the already processed file from those still in process.
the csv headers for the NewUser are - firstname,lastname,copyuserLN,company,password,email,startdate
It will be the same thing for the change user with the exception of the 'copyuserLN' header.
I have a question, in the rare event that say a company realizes that they made a mistake in the newuser request with the spelling of the name of the start date, how do you resolve that? do they just fill out a new user request or a change user request? and does your script overwrite the former request?
the csv headers for the NewUser are - firstname,lastname,copyuserLN,company,password,email,startdate
It will be the same thing for the change user with the exception of the 'copyuserLN' header.
Hopefully with the addition of 'Primary Key' (or the header that calls it out) as well :)
in the rare event that say a company realizes that they made a mistake in the newuser request with the spelling of the name of the start date, how do you resolve that
Great question and this reiterates the importance of a primary key. You'd definitely want them to use 'change user' to modify the original request. If they did a new user you'd now have two users and you'd have to rely on manual intervention to remove the 'incorrect' user from AD.
StartDate brings up another interesting topic that I'll create a new 'issue' for.
yes defintely with the addition of the primary key :)
Still need to figure out how to differentiate between a "changeuser" and a "newuser" request and script for that. Are you going to do it by filename? The script as-is will treat all CSV imports as a new user.