cjlee112 / spnet

selected papers network web engine
http://thinking.bioinformatics.ucla.edu/2011/07/02/open-peer-review-by-a-selected-papers-network/
GNU General Public License v2.0
40 stars 11 forks source link

Paper recommendation misassigned #80

Open sashakolpakov opened 11 years ago

sashakolpakov commented 11 years ago

I wrote a recommendation for paper arXiv:0910.4103 by Kellerhals and Perren, and mentioned the paper they referred to, arXiv:0906.1596, in the recommendation body [...]. However, now my recommendation is associated with arXiv:0906.1596, in error, despite the tag #recommend arXiv:0910.4103. Please see the screen snapshot attached.

I'm afraid that it may be hard to bring references to arXiv entities though, since the recommendations get misassigned this way.

Please check it up and let me know how I can correct this mistake.

I bring my apologies for this issue.

Alexander Kolpakov untitled

sashakolpakov commented 11 years ago

A little fix that I used: I deleted the reference to arXiv:0906.1596 from my recommendation and now it's assigned to arXiv:0910.4103, as desired. However, this tells me that the assignment is ruled by the first found reference to the arXiv independently on the #recommend tag. Again, I hope you will get in touch with me to clarify this issue. See ye, Alexander.

cjlee112 commented 11 years ago

Yes, we need to add support for a post to reference multiple papers, and set a rule for determining which paper should be treated as the "primary" paper for the post (presumably, the first paper ID after the #spnetwork tag). This should not be hard to do.

sashakolpakov commented 11 years ago

Thank you for your response! I actually thought that the system will identify the first paper ID after the #recommend tag. However, is it possible to create another tag, say #ref, to make a hyper-reference (a web-link to arXiv or any other repository) to a paper? I mean smth that acts like the latex \ref{my_reference} command.

semorrison commented 11 years ago

It's possible, of course, to configure any parsing of posts we want. But we should try to remain as flexible and intuitive as possible; certainly no one should ever have to 'read the manual' to work out how to post to spnetwork. Hence I'm not keen on #ref.

That said, it's also important we follow the principle of least surprise, so the suggestion to take the first reference after #spnetwork as the primary reference is a good one.

How prevalent is the use of #recommend at this point?

On Aug 25, 2013, at 12:35, sashakolpakov notifications@github.com wrote:

Thank you for your response! I actually thought that the system will identify the first paper ID after the #recommend tag. However, is it possible to create another tag, say #ref, to make a hyper-reference (a web-link to arXiv or any other repository) to a paper? I mean smth that acts like the latex \ref{my_reference} command.

— Reply to this email directly or view it on GitHub.

sashakolpakov commented 11 years ago

All right, I agree with you on the principle of taking the first reference after #spnetwork as the primary reference, since the tag #recommend will come together with #mustread depending on the character of the recommendation. Hopefully, I have reached some understanding of the idea. Please tell me if I'm mistaken somewhere.

cjlee112 commented 10 years ago

We can add support for "secondary citations" as follows. In a given post or recommendation, one paper would be designated as primary i.e. the paper being recommended or the main focus of the post. Additional papers could also be cited. The unique ID of the rec / post would be added to a list of secondary citations on each of those papers' records in the database. When showing one of those papers, that would be displayed as link of the form "other posts mentioning this paper". When displaying the post itself, its text would presumably contain links to the secondary citation papers. This design would incur no additional db queries for displaying either the primary paper (the links are already included in the post text, which is stored in the primary paper record) or displaying a secondary paper (it simply shows a "other posts mentioning this paper" and doesn't need to retrieve those unless the user actually clicks that link).

cjlee112 commented 10 years ago

@sashakolpakov @semorrison @johncarlosbaez @pkra We need to decide how to handle this "multiple citations" challenge, as I think it arises in each of the cases we are working on with you (e.g. MathOverflow trackbacks often map to more than one arxiv ID to a MO post; TWF and other blog posts are quite likely to have URLs for more than one paper per post, etc.).

To my mind, the simplest, most general solution is to have a flag for each paper citation that categorizes it as "recommended", "discussed" or "cited" (meaning the post recommends that paper, or discusses (actually says something about) that paper, vs. merely citing that paper in support of some point). This flag would just be a text field in the database, so we could later add further categories whenever that seems necessary.

For initially loading a lot of blog posts, I think we'd set "discussed" as the default category for paper citations. We could then let users adjust that for individual paper citations. For example, if a user viewed a post and found it merely cited the paper without saying anything meaningful about it, they could change its category to "cited".

What do you think?

semorrison commented 10 years ago

Looks good to me.

(The "cited" category suggests an ambitious plan for one day in the future; process all the arXiv sources files for citations...)

On Sat, Sep 28, 2013 at 8:35 AM, cjlee112 notifications@github.com wrote:

@sashakolpakov https://github.com/sashakolpakov @semorrisonhttps://github.com/semorrison @johncarlosbaez https://github.com/johncarlosbaez @pkrahttps://github.com/pkra We need to decide how to handle this "multiple citations" challenge, as I think it arises in each of the cases we are working on with you (e.g. MathOverflow trackbacks often map to more than one arxiv ID to a MO post; TWF and other blog posts are quite likely to have URLs for more than one paper per post, etc.).

To my mind, the simplest, most general solution is to have a flag for each paper citation that categorizes it as "recommended", "discussed" or "cited" (meaning the post recommends that paper, or discusses (actually says something about) that paper, vs. merely citing that paper in support of some point). This flag would just be a text field in the database, so we could later add further categories whenever that seems necessary.

For initially loading a lot of blog posts, I think we'd set "discussed" as the default category for paper citations. We could then let users adjust that for individual paper citations. For example, if a user viewed a post and found it merely cited the paper without saying anything meaningful about it, they could change its category to "cited".

What do you think?

— Reply to this email directly or view it on GitHubhttps://github.com/cjlee112/spnet/issues/80#issuecomment-25282151 .

johncarlosbaez commented 10 years ago

That sounds good to me too. Most blog articles that link to a paper say something about it, though perhaps not a lot.

sashakolpakov commented 10 years ago

Thank you for this suggestion! Looks pretty good to me, since "discussed" is a broad term, which accommodates a variety of references to some given paper ever made.

cjlee112 commented 10 years ago

FYI, the citationType proposal, and multiple citations, plus a major refactoring (e.g. consolidated the previously separate Recommendation and Post classes into just the single Post class), have been implemented; code is available on my multicite branch. We'll test it for a bit internally prior to deploying to selectedpapers.net website. This work is discussed here.

ketch commented 10 years ago

My recent post was also misassigned: https://selectedpapers.net/posts/z12xvr0qokfvwtbvd22sjl5ogtjyjz2c4

It contains three paper DOIs. The first is immediately preceded by #recommend, the others by #discusses. Yet SPnet thinks it's a recommendation for the third DOI.

cjlee112 commented 10 years ago

@ketch hmm, what I see is:

Correcting the DOI regexp (to the post-2009 restricted character set) enables it to detect the DOIs and it assigns the correct order to the papers.

cjlee112 commented 10 years ago

@sashakolpakov @semorrison @johncarlosbaez @pkra We need feedback on a more detailed proposal for tagging paper references with a citation type, from @ketch

We clearly need something like this so people can specify a citation type for each reference. The challenge is the details, e.g.

How close must the tag and paper ID be to each other? What if a user typed something like:

arxiv:1234.5678 #recommend 
I am now going to write a long paragraph...
Actually, pages and pages of text...
In which I later say the following paper is total crap:
arxiv:9876.5432

The user would probably object if we interpreted this as a recommendation for arxiv:9876.5432...

Note we can't exactly require that there be nothing but whitespace between the tag and the paper ID, because Google+ (for example) will add HTML tags around the hashtag...

Also, what should we do about cases like

#spnetwork arxiv:1234.5678 #recommend arxiv:9876.5432

By our original rule, #recommend would bind to the first paper after #spnetwork, but by the new "pair rule" (which presumably would have higher precedence) it would bind to the paper after #recommend.

One simple alternative: we could say that such tag bindings only bind to papers in the same line or paragraph, i.e. something like

#spnetwork #recommend arxiv:1234.5678
#discuss shortDOI:abcde

The separator could be an explicit linebreak (<BR> in HTML) or paragraph break (<P> or </P>).

These are just ideas. The $64,000 question is whether there will be any consistency to what users do and expect, such that we can devise a rule that will actually work right on (almost) all posts.