andreaskuster / ner-and-pos-when-nothing-is-capitalized

ner and pos when nothing is capitalized - paper reproduction
GNU General Public License v3.0
2 stars 0 forks source link

Convert report to the final format style (i.e. latex, sections, ..) #22

Closed andreaskuster closed 4 years ago

andreaskuster commented 4 years ago

Make sure to follow all the point from the email:

Hi everyone! At this point, we've read through everyone's project version 1s; individual group feedback should be posted on canvas sometime by the end of tomorrow.

Aside from that, to help you make sure that you're including everything we're going to be looking for (and grading for) in the final project, Jesse wrote up a LaTeX template for the project writeup that you can use in Overleaf/any other LaTeX editor here:
https://www.overleaf.com/read/yrqhyqcjxfrf (to use it as a template, log in to overleaf, then visit that link, then from the upper-left hand menu, select "Copy Project"). We're not requiring you to use this template, but it would almost certainly be helpful for each team to at least look at; with that in mind, if some of you aren't TeX users, I've also attached the PDF version of the template to this announcement so that you can read through it.

Aside from putting together that template, Jesse also put together a set of high-level feedback after reading through all the project version 1s, which will also be helpful for you to read through. Here it is:

Good work on the reports so far! This mimics the research process -- it's often a good idea to start writing a draft which includes your hypotheses and a description of the experiments you will run. This will have placeholder tables and figures which we fill in as the experiments finish.
Some high-level feedback: If we put something in the description on the syllabus, or in the template, be sure to include it! We will be looking for these item when we're grading. For example, the syllabus says to include the total number of GPU hours used, so do so.

The syllabus describes 6 sections which need to be covered in your report -- your report should have these 6 sections (plus maybe an intro), numbered, in the order listed. While it isn't required, we strongly recommend using the EMNLP 2019 LaTeX style files. We've made a template for you to fill in. You can use Overleaf for free.
Be sure to record approximately how long you spent (human hours, not GPU hours) on each set of experiments or implementations. Each project is different, this will help us grade.
For each experiment in the experimental results section, be sure to say which hypothesis that experiment is supporting.
For every hypothesis in the Contributions section, list which experiments support it.
Your hypotheses should be a list, not a paragraph. See the template for some examples.
If you think there are no hyperparameters, there probably still are. There are many small decisions we make when implementing algorithms (how to break ties, the size of the vocabulary, etc.), this is the place to list them.

Let us know if you have any questions!
andreaskuster commented 4 years ago

Have a look at the comments from version 1 too:

The biggest thing that I would say overall is that from reading through your version 1 writeup, there are a *ton* of results in there that are still marked as to-do. Have some of those already been done, or are all of them still left to run? I would suggest narrowing the scope of your project to focus on only some of the original paper's experiments— that's fine! We'd much rather see a thorough investigation of one important set of original experiments from the paper than a rushed investigation of every experiment from the original paper. Be sure, too, that as you're running your experiments, you're collecting all the information about them that the project instructions (particularly the computational requirement section) asks you to report (https://docs.google.com/document/d/1Dd9_VQHXseiroirUI-1rBDS6mJEUHiDQ7ND321O29W8/edit#bookmark=id.g9my7okeo4i3), so that you'll have that information to plug into your writeup.

For the final writeup, please write up your own summary of what the original paper did instead of copying the paper's abstract into your writeup; that will help to introduce your report without making it sound like the experiments that you're actually replicating are some of your extensions.

The "hypothesis" section should also write out the *original paper's* hypotheses (in your own words) that your replicated experiments test, in such a way that it's possible to look at your hypotheses, look at your results without comparing them to the original paper's results, and give a "yes" or "no" answer to whether they hold up. (Including the original experiments' results in your table is fine, but a reader shouldn't *need* to look at those parts of the tables to determine whether the results of your experiments support your hypotheses.)

The OOV procedure that you describe is somewhat untraditional; the risk of having (potentially) each unique token (character, in this case) in your training set contribute to the embedding for OOV is that that trained OOV embedding will therefore reflect (in large part) very common characters, which will not be the case when you're actually facing unknown characters at test time. Therefore, a much more common procedure is to select only the rarest tokens in your training set, and change those all to OOV, so that the model's learned OOV representation will reflect character rarity.

I'm excited to see what you find! Let me know if you have any questions.
andreaskuster commented 4 years ago

This is the structure we have to follow:

Your report will include the following. The amount of work put into each section below could be different for different reports. Generally, focus on what future researchers or practitioners would find useful for reproducing or building upon the paper you choose.

1. Contributions
A clear list of the scientific hypotheses evaluated in the original paper. Some papers don't make this super clear, so it can take a couple readings of the paper to understand.
A list of the hypotheses evaluated in your report. This will likely overlap with 1a.
A description of the experiments in the report, and how those experiments support the hypotheses in 1b.

2. Code
If writing your own code, make sure it is documented and easy to use (this project is about reproducibility!). Include a link to a github repository which can be installed and run with a few lines in bash on department machines. Include a description of how difficult the algorithms were to implement.
If using public code from the original repository, more of your energy will go into running additional experiments, such as hyperparameter optimization, ablations, or evaluation on new datasets (see below).  However, note that it’s not always trivial to get a public code release working!

3. Experiment reproduction.
Model description (type of model, total number of parameters, etc.).
Dataset description (training / validation / test set sizes, label distribution, and other easily explained information a typical reader might want).
Hyperparameters: A clear description of the hyperparameters used in the experiments. While some hyperparameters will be specific to a particular model, there are many that are common (learning rate, dropout, size of each layer in the model, the total number of parameters, etc.). Lean towards reporting even uninteresting hyperparameters. You can see an example of how to do this in the appendix here.
For each experiment, a description of how it does or doesn't reproduce the claims in the original paper.

4. Experiments beyond the original paper.  The amount you do will depend on how smoothly the above parts of the project went. Examples include:
Hyperparameter search: you could assess the sensitivity of the findings to one or more hyperparameters, or measure the variance of the evaluation score due to randomness in initial parameters. If you do hyperparameter search, be sure to describe the method used (grid search, uniform sampling, Bayesian optimization, etc.). At least include the min, max, mean / median, and variance of the performance; further sensitivity analysis (e.g. plots) could be warranted.
Varying amounts of data: often a paper will only include the performance of a model after training on the full training set. You could evaluate a model (on validation data, not test data) with varying amounts of training data. An example of an interesting conclusion here could be that the baselines from the original paper outperform the new model when trained on a small amount of data, but eventually the new model outperforms the baseline.
Evaluate on a new dataset: evaluate if the conclusion as to which model performs best (as reported in the original paper) holds on a different dataset. 
Ablations: some papers introduce many new ideas but don't evaluate the contribution of each individually. A valuable study could evaluate each component individually.

5. Computational requirements
All reports should include relevant information about computational requirements. The requirements for the original paper should be (roughly) estimated. For the experiments in the report, include at least the type of hardware used, the average runtime for each approach, the total number of trials, the total number of (GPU) hours used, number of training epochs, and any other relevant info.
Some authors will have had access to infrastructure that is way out of your budget; don’t choose such a paper!

6. Discussion and recommendations for reproducibility
A section which discusses the larger implications of the experimental results, whether the original paper was reproducible, and if it wasn’t, what factors made it irreproducible. 
A set of recommendations to the original authors or others who work in this area for improving reproducibility.
andreaskuster commented 4 years ago

In order to not over-expand the README.md file in the root directory, I moved content (i.e. hyperparameter search, code usage, computation requirement and human effort, ..) to pos/README.md Please make sure to include this in the final report too.

balbok0 commented 4 years ago

Sounds good, and just started a branch with the initial version of the final report.