embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.78k stars 236 forks source link

Paper Writing: An overview issue #896

Open KennethEnevoldsen opened 2 months ago

KennethEnevoldsen commented 2 months ago

This issue is an overview issue for paper writing. For full discussion of what needs to be done check out #784. The intention for this issue is to make it easier for contributors to find places to write on as well as for us to guide them in the right direction and keep an overview.

How to discuss these segments:

Writing Sections:

Other concerns

gentaiscool commented 2 months ago

Hi @KennethEnevoldsen, thanks for the effort in organizing the paper overview. I'd like to assist in completing the related work section by incorporating recent papers to enhance its relevance. I agree that we need paraphrasing the initial segment and adding more distinct aspects to set our work apart from existing research. Additionally, I am aware of several large-scale collaborative projects that could be referenced in our paper to make the related work section more comprehensive. And, I was wondering to know on how we determine contribution points for paper writing. I am happy in general to help writing in any sections if needed.

KennethEnevoldsen commented 2 months ago

Sounds wonderful I would be very happy if had the time to go over those sections. Feel free to ping me once you have done so.

Generally, we add points based on relative effort. Since most contributors have added datasets before, they have approximately encoded a points-to-effort ratio. We have the writer suggest points, and then, of course, we can discuss if it makes sense afterward.

This is of course, not a perfect system (but it is always hard to quantify contributions)

gentaiscool commented 2 months ago

Thank you, @KennethEnevoldsen, for the explanation. I will review the entire paper and focus on the sections where I can contribute, particularly those that don't require waiting for experimental results.

isaac-chung commented 2 months ago

Not sure if we had discussed this before: would any of the language family groupings e.g. in https://github.com/embeddings-benchmark/mteb/issues/366 have a place in the paper? or would that require https://github.com/embeddings-benchmark/mteb/issues/837 to be completed first?

MariyaTikhonova commented 2 months ago

Hi @KennethEnevoldsen, thanks for the effort in organizing the paper overview.

My colleagues and I, we'd like to help you with the paper writing, if our help is appreciated.

1) We'd like to assist in completing the limitations and ethical consideration, if it is still actual.

2) Besides, we could add basic information about the Russian-language datasets we contributed to MTEB, if needed. We could also provide model evaluation we carried out not long ago.

3) On the final stages we could also contribute to the general paper correction (small typos, uniform model naming, etc.)

KennethEnevoldsen commented 2 months ago

@MariyaTikhonova

1) Sounds great

2) Can you go over section B. If you have created dataset for the benchmarks then please add that to B3. You might create a new appendix on Benchmark Creation and describe the curation rationale for the Russian benchmark. For now results are not needed, but might be added in the future.

3) Sounds lovely as well. I would go for 1 and 2 to start with.

gowitheflow-1998 commented 2 months ago

hi @KennethEnevoldsen, let me know if you need me to add information of RAR-b tasks to the paper and anything I can help with the paper writing in general!

KennethEnevoldsen commented 2 months ago

@gowitheflow-1998 can I ask you to add a section in appendix B4?

gowitheflow-1998 commented 2 months ago

@KennethEnevoldsen Sure. Will do today!

mariyahendriksen commented 1 month ago

hi everyone, I am done with the introduction part of the paper. I will start going over the remaining parts sequentially. Please let me know if there is any section/aspect I should pay additional attention to!