Closed StephAO closed 1 month ago
Hi @StephAO. Thanks for your interest. We will provide you with specific instructions to run all experiments within 1-2 days. Please stay tuned.
@gao-g @ataymano .
Thank you for the update @dkmisra A few additionally notes I've found going through your repository:
"ARTICLE: {input}"
vs " Article:{input}"
)Hi @StephAO, Thank you for your interest! Please find the experiments folder with the running instructions here Please let me know if you have more questions or need any further help.
Hi @StephAO
I can comment on these other comments. By the way, thanks for very precise and helpful comments.
We'd standardize them and re-run the experiments. @gao-g and @ataymano (noting for next version). I doubt this will cause a significant difference but let's see what the new results say.
Would you happen to have an example of this? @ataymano
You are right that the code isn't normalizing BERT embeddings. This needs to be fixed. We'd do that in the next version.
We didn't compare against CLS embedding but the use of averaging makes sense to me and has been used in the literature for text encoding (e.g., 1, 2). A CLS token might be more focused on certain words, perhaps towards the boundaries of the text. In contrast, averaging gives more weight to each token but can dilute important words. I feel in hindsight we should have used something like BertScore.
@dkmisra Thank you for your reply. As you mention, I suspect most of these won't have any significant impact on the takeaways of the work.
For 2. all the prompts are in this format (for example ). Easy to fix by using a different notation (see below).
>>> edit_prompt = f"""Email: {output} \n
... Assume that you prefer {preference}.
... Please revise the above email to meet your style:"""
>>>
>>> edit_prompt
'Email: output \n\n Assume that you prefer preference. \n Please revise the above email to meet your style:'
>>> edit_prompt2 = (f"Email: {output} \n"
... f"Assume that you prefer {preference}. "
... f"Please revise the above email to meet your style: ")
>>> edit_prompt2
'Email: output \nAssume that you prefer preference. Please revise the above email to meet your style: '
>>>
Another quick note. In the paper, you include "question answering style, direct, concise" as the preferences for Movie Review, however in the code, it is only "question answering style" https://github.com/gao-g/prelude/blob/8fe20a8f1332090ccdd2f9498b7d7c33d0de7c49/src/task/summarization.py#L15
Thanks so much for catching this mistake! We will correct the paper by changing the latent preference for movie review to "question answering style" in order to match the codebase. We did use "question answering style, direct, concise" in early experiments, and later found that "question answering style" is sufficient to guide LLM to give reasonable behaviors. Sorry about this confusion, and thanks again for raising this issue. We'd love to acknowledge you in our acknowledgement section in our new revision.
I found your work very interesting, so I'm happy to help.
While unnecessary, if you would like to acknowledge me, my full name is Stéphane Aroca-Ouellette
We would totally love to acknowledge your help @StephAO. We will add your name to the acknowledge section of the next revision to the paper where we also plan to accommodate most of your comments here.
And please feel free to send us any papers you write in this space. Thanks again for very useful feedback.
I am hoping to reproduce the results. When can I expect the configs/instructions to replicate the experiments in the paper to be updated?