embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.61k stars 211 forks source link

Paper Writing: Methods - The Open-source Effort [in need of review] #1006

Open KennethEnevoldsen opened 3 days ago

KennethEnevoldsen commented 3 days ago

@imenelydiaker already did good work here. I added a few additional things.

@imenelydiaker do you mind taking a look at this again?

dokato commented 1 day ago

I can help with review and also added a few changes, but here are some general comments.

And the overall goal of this section is a bit unclear to me. A lot of things are mentioned superficially, or referred to only in appendix. How detailed we want it to be? Another question is whether entire "Methods" should really be before "Related Work" as we draw some inspiration from earlier community driven initiatives that are discussed there, so it could read more easily building upon that.

KennethEnevoldsen commented 18 hours ago

I can help with review and also added a few changes, but here are some general comments.

Would be lovely!

Some sentences read difficult (I recomend using Hemingway app to check for clarify), see my rewrite of the first sentence.

I agree. We should probably do this at the end though. Focus on the main narrative points to start with

"Datasets Quality Assurance" should that be really part of this section? Apart from "asking contributors to fill in metadata fields" fragment there's little relevance to the healine above.

I believe it is related to the open-source effort. Do you have another suggestion on where to put it

I know that point-based system is detailed in sec:contributions appendix, but this sounds like important enough to move it, or at least briefly outline in the main section of the paper.

I would welcome a rewrite if you have a good one - One thing I am afraid of is reviewers getting too caught up on the specifics of a non-perfect point system (we know it is - it can't be perfect).

“near-perfect scores” what does it mean? Should we specify threshold?

We can probably be more clear I agree. However near perfect scores vary from metric to metric though I think most people would understand it as a score close to the maximum of what would be possible for the task (often close to 1). Again a reformulation is very welcome

I'm afraid that description of collecting metadata fields for dataset may not be clear for someone who has never done it, maybe we should consider providing a brief example of the metadata field as listing or figure?

Again def. welcome to suggest a rewrite here