WilliamPham1602 / Document-Splitting

UvA Final Thesis
1 stars 0 forks source link

Meeting 03-06-2022 #9

Open WilliamPham1602 opened 2 years ago

WilliamPham1602 commented 2 years ago

Hi @maartenmarx,

Thanks for your time. These are a few points, which I would like to discuss in our early meeting:

This is the report link

Regards, Sang.

maartenmarx commented 2 years ago

Hi @WilliamPham1602 ,

Thanks, here a few tips:

  1. Find or hire someone to help you with your English writing. It is full of mistakes and in this form not acceptable for a master thesis at UvA.
  2. Your subquestions are interesting and very varied. It is no problem if you change some of them. But be sure that all questions are answered, and that you do NOT give resulkts for which there was no question.
  3. You seem to use 2 ways of referencing to papers. Use 1, and make sure it is a hyperlink.
  4. A lot of your "related work" is well related, but not directly relevant to your research. Better remove it. Stick to things that are important.
  5. Is section 3.1 then your answer to the first sub research question?
  6. I recommend not to put such low level code in your thesis, especially because it is not very elegant or efficient.....Please ask yourself WHY this must be there. What research question does it help answering.
  7. Just look at this np.concatenate( [[1]+[0]*(i-1) for i in original_indexes]) and remember, I am a political scientist!
  8. 3.3 is ok
  9. 3.4 must be updated. No specificity and sensitiviuty and accuracy! Only precision, recall and F1, and also BCubed P, R and F1.
  10. I do not understand your train test split. Please also sent your code/runs to Ruben, so he can test on the secret test set.
  11. Also compute simple non-learned baselines, eg, always split on the mean document length.

goiod luck maarten

WilliamPham1602 commented 2 years ago

Hi @maartenmarx,

I am appreciate to your feedback.

For the baselines model, I used the logistic regression model. However, could you explain more detail about number 11?

Regards, Sang.

maartenmarx commented 2 years ago

hi @WilliamPham1602 , with mean doc length baseline, I mean that you compute the mean doc length, say X, and then split up the stream in X-length documents. This is usually a quite strong baselinje.

WilliamPham1602 commented 2 years ago

Hi @maartenmarx,

I understand that we need the baseline model for the performance comparison. I already created a simple Logistic Regression model, can I use it as a baseline model?

Regards, Sang.

maartenmarx commented 2 years ago

but then with what do we compare that LR model? Maybe it is more stupid than my mean baseline. Don't you want to know that? Or don't you think your examiner would like to know that?

And still your neural model could be better than your LR baseline but worse than my mean baseline if you do not report on that. That would be quite misleading.

Come on @WilliamPham1602 , think!

WilliamPham1602 commented 2 years ago

Hi @maartenmarx,

Thanks for your advice. I am working on the baseline model now.

Regards, Sang.