ashita03 / H1BVisa2014_Analysis

Explore the H1B Visa data from 2014 through a comprehensive multivariate analysis.
0 stars 0 forks source link

Using multiple models & grid search #1

Open ChrisD-7 opened 1 month ago

ChrisD-7 commented 1 month ago

Hey I was impressed with your model I would like to know if you would consider multiple models with grid search for a better understanding.

Second if you could predict the wait times as well

ashita03 commented 1 month ago

Hello, thank you for reaching out :)

I mean, that’s a great next step to try. I believe grid search would be a great idea to try multiple models and figure out the best one. For the project purpose, I limited myself to just the 3 models tried.

That’s a great suggestion. However I am not certain if in the recent years, the wait time differs much for individuals since more or less everybody receives results around the same time. Do correct me if I'm wrong.

Thank you

On Wed, May 22, 2024 at 9:18 PM Chris DSilva @.***> wrote:

Hey I was impressed with your model I would like to know if you would consider multiple models with grid search for a better understanding.

Second if you could predict the wait times as well

— Reply to this email directly, view it on GitHub https://github.com/ashita03/H1BVisa2014_Analysis/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP4LBRS3UQOUJKT46ZZOZMLZDS44DAVCNFSM6AAAAABID5UCQSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMYTAOBYGI2DOMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ChrisD-7 commented 1 month ago

Yes my bad 😂 confused h1b with green card wait time

Lmk if you would like to collaborate on any projects

PS : what about probability of h1b

ashita03 commented 1 month ago

I'd love to! I just checked out the kind of projects you've worked on, and I must add that they're commendable!

So there are different kinds of H1B/International Visas, maybe we could check out the probabilities of approval for each of these visa classes

ChrisD-7 commented 1 month ago

that too and if we can get some of the recent data we can perform eda to check better stats for the grid search I was talking

The file that has some grid search implemented

# Testing Multiple Models:
# List of models
models = [
    DecisionTreeClassifier(),
    LogisticRegression(max_iter=1000),
    SVC(),
    RandomForestClassifier(),
    BernoulliNB(),
    KNeighborsClassifier()
]

model_names = ["Decision Tree", "Logistic Regression", "SVC", "Random Forest", "Naive Bayes", "K-Neighbors"]

# Initialize dictionary to store accuracies
model_accuracies = {}

# Training and Evaluating Models
for model, name in zip(models, model_names):
    model.fit(X_train, y_train)
    predictions = model.predict(X_test)
    accuracy = accuracy_score(y_test, predictions)
    model_accuracies[name] = accuracy  # Store the accuracy in the dictionary
    print(f"{name} Test Accuracy: {accuracy:.2f}")
    print("Confusion Matrix:\n", confusion_matrix(y_test, predictions))
    print("Classification Report:\n", classification_report(y_test, predictions))

# Identify the best performing model
best_model_name = max(model_accuracies, key=model_accuracies.get)
best_accuracy = model_accuracies[best_model_name]
print(f"The best performing model is: {best_model_name} with an accuracy of {best_accuracy:.2f}")

# Select the best model
best_model = models[model_names.index(best_model_name)]

We can then run grid search on it

ashita03 commented 1 month ago

Woah! This is awesome. I am yet to try this in Python since I have based this entire project on R. Have you tried this or provided a template code for trial and error?