Adding a neural topic model baseline

farinamhz commented 1 year ago

Here we are going to add the CTM baseline to the pipeline as a neural topic model.

farinamhz commented 1 year ago

Hi @hosseinfani, I added the CTM baseline and added the percentages of the hide function for the evaluation section. However, there is a problem with this new model that its evaluation takes too much time. In their paper, they said much lesser time for each epoch. But we have ~16 minutes for training. At the end of the day, we can handle the training, but the evaluation is taking unusual time. For example, we are going to evaluate 15% of 350 reviews that each of them has avg ~3 documents or sentences, and inference for each of these reviews takes almost 2 minutes. It means that if we have 5 folds and 11 different evaluations for 0, 10, 20,...,100 percent of hide the aspect, in total, it takes almost 4 days to evaluate just the results before back-translation! I am running on GPU, and for sure, if it takes this amount of time, we would not have time to test different values for each param! Finally, I think that there is a problem somewhere that is taking too much time, even when I have done it from their document. This was the whole problem, and I would appreciate it if you had time for a meeting to talk about this.

hosseinfani commented 1 year ago

@farinamhz We'll talk tomorrow.

nonetheless, it's time to switch your experiments to computecanada then. we have a doc in General > Files > Library > Compute Canada guide that helps you.

@smh997 did you convert that doc into https://github.com/fani-lab/Library/blob/main/ComputeCanada.md?

smh997 commented 1 year ago

@hosseinfani It is still in progress and still needs to be finalized. I am adding the GPU part. I expect to finish it by tomorrow (at least the first version as a draft). However, I can share my experience with @farinamhz before I update the repo.

farinamhz commented 1 year ago

Hi @hosseinfani, Results and code for the CTM model and changes in the evaluation have been added. Also, all the results with their aggregation have been added, and you can see it in ../tree/main/output/English/Semeval-2016/25 https://github.com/fani-lab/LADy/issues/31

farinamhz commented 1 year ago

@hosseinfani Result for CTM:

farinamhz commented 1 year ago

Fortunately, we have reasonable results like other baselines for the first 5 selections, which is good news!

farinamhz commented 1 year ago

But in comparison with other models, the results of CTM are less than other baselines.

farinamhz commented 1 year ago

Hi @hosseinfani, These are the results for epoch = 10 and epoch = 100. Unfortunately, with increasing epoch to 100, although we have an improvement in success values, the results after back-translation decrease! (10 means epoch=10 and 100 means epoch=100)

hosseinfani commented 1 year ago

@farinamhz Interesting! Can you do [10, 100, 200, 300, 400, 500, 1000] epochs and draw the same diagram?

hosseinfani commented 1 year ago

We found https://github.com/MIND-Lab/OCTIS that includes neural and non-neural topic modeling.

There is an issue installing scikit-learn == 0.24.2 when installing on python 3.10. I reduced the python to 3.7 and it's been installed with no issue.

b.py
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for scikit-learn
Failed to build scikit-learn
ERROR: Could not build wheels for scikit-learn, which is required to install pyproject.toml-based projects

hosseinfani commented 1 year ago

@farinamhz we can close this issue. let me know otherwise.

fani-lab / LADy

Adding a neural topic model baseline #30