fani-lab / OpeNTF

Neural machine learning methods for Team Formation problem.
18 stars 12 forks source link

Cl Experiments On OpeNTF #212

Open rezaBarzgar opened 11 months ago

rezaBarzgar commented 11 months ago

This issue is for logging the results of the OpeNTF with 2 different loss-based curriculum learning methods (Data Parameters, SuperLoss) on 4 datasets (IMDb, GitHub, DBLP, USPT) by 2 models that have state-of-the-art results (fnn_emb, bnn_emb).

In These experiments, hyperparameters should remain the same.

rezaBarzgar commented 11 months ago

@hosseinfani Dr. Fani,

If you approve, I plan to assign this task to Marco. I will ask him to run it on Compute Canada and share the results with us. I believe it will be a good learning experience for Marco to become familiar with the process of running large-scale projects on cloud resources. I appreciate your thoughts on this.

Thank you

hosseinfani commented 11 months ago

@rezaBarzgar perfect. thank you.

rezaBarzgar commented 11 months ago

@MarcoKurepa There is comprehensive documentation available on using Compute Canada in the MS Teams -> SCS - HF Research Group / General / files / Library / ComputeCanada Guide.docx folder. Once you create an account, either Dr. Fani or I will send you an invitation link. Also, you can update the file with your experience if you feel it needs to be mentioned.

These results are for a research paper that we aim to submit at ECIR 2024

MarcoKurepa commented 11 months ago

On it!

MarcoKurepa commented 11 months ago

@hosseinfani Please confirm me on Compute Canada. image

hosseinfani commented 11 months ago

@MarcoKurepa already done!

MarcoKurepa commented 11 months ago

Response from Compute Canada support is already in. Gmail - Error 4 on Cluster Beluga.pdf

hosseinfani commented 11 months ago

@MarcoKurepa I asked Mohammad to tar his file. Hopefully, we can solve the issue soon.

MarcoKurepa commented 10 months ago

MarcoKurepa commented 10 months ago

I changed by removing #SBATCH --gpus-per-node=1 because it had a redundancy conflict causing an error with #SBATCH --gres=gpu:v100:1.

MarcoKurepa commented 10 months ago

The job is now pending I will update you when it has been completed.

hosseinfani commented 10 months ago

@MarcoKurepa I think you need to change the email notification to your own. I've been receiving some notification from computecanada :D

MarcoKurepa commented 10 months ago


I think you need to change the email notification to your own. I've been receiving some notification from computecanada :D

Apologies for that! I'll look into it🙂

MarcoKurepa commented 10 months ago

Sorry if you got any more notifs today, I just realized they were coming from the slurm script! You shouldn't receive any more emails now:)

hosseinfani commented 10 months ago

@MarcoKurepa no worries. correct, I don't receive them anymore. tnx.

rezaBarzgar commented 10 months ago

@MarcoKurepa Hey Marco, how are the experiments going on? Also, Have you started the write-up? I just wanted to remind you that these tasks are the priority gently. Please update me regularly even if you have not made any progress or you are busy with some other things.

MarcoKurepa commented 10 months ago

@MarcoKurepa Hey Marco, how are the experiments going on? Also, Have you started the write-up? I just wanted to remind you that these tasks are the priority gently. Please update me regularly even if you have not made any progress or you are busy with some other things.

Hey Reza, sorry for the lack of updates. I have mostly been working on the computecanada cluster, although to be frank I've been having a lot of hiccups with it especially in getting the correct versions of the libraries. At the moment I am still trying to squash errors relating to runnign the script but I have an .sh ready to run once I've gotten to the point where I can run a sample dataset in the terminal.

I'll make sure to keep you posted moving forward.

rezaBarzgar commented 10 months ago

Results' sheets:

MarcoKurepa commented 10 months ago

While I was working on the CL Visualization issue, I made an error which resulted in VSCode crashing and I lost the progress made on training CL on the Github dataset. However, in the 48~ hours that it did run ti only finished 2 epoches (still the first fold), so I think it'd be better to run it off Reza's computer regardless. Also I would've needed to restart the run either way to include the expert loss logging script, so no real progress was lost, just a mild annoyance in the end.
