[suggestion] specific search in the citation only

anHALytics / anHALytics-frontend

Frontend interfaces suited for anHALytics API (in development)

3 stars 1 forks source link

[suggestion] specific search in the citation only #36

Open lfoppiano opened 8 years ago

lfoppiano commented 8 years ago

Hi, I'm not sure if it's already implemented, but I find it a nice feature, when you have a paper and you want to find more recent papers on the subject.

I did not find such feature in the Google Scholar's search box:

You need to find the article (which could be not in HAL):

..and then you will get the articles that cite the searched one:

Please note that the box 'search within cited articles' is normally unticked.... it looks dummy (does nothing), or I haven't understood how to use it (therefore is not user friendly 🎯)

kermitt2 commented 8 years ago

Do you mean that for a document D you want to search in the set of documents {D'} that are citing D? (that's what Google Scholar does no?)

This is now available right now because:

we are not resolving the bibliographical references (we are waiting for the entity matching module ;)
there would be too few citing documents right now in HAL to have something meaningful (not to say within Inria publications, even worse)

With the work in the ISTEX chantier d'usage, I think we might be able to a more meaningful implementation.

lfoppiano commented 8 years ago

Do you mean that for a document D you want to search in the set of documents {D'} that are citing D? (that's what Google Scholar does no?)

Yes, but in Google Scholar you need to find the document first.

Since we don't have the same document set, what we could do would be search independently from the fact that we have or not the cited document {D} in the database.

kermitt2 commented 8 years ago

But in term of workflow how do we arrive to D if this document is not in the repository? The citing documents only make sense if we can arrive to D in the search application - and this is what Google Scholar is doing.

What would be doable I think is to connect all the citing documents to D (even not present) in a graph representation and exploit that for document recommendation, label propagation, etc. But we have excluded all the graph aspects given that there's too much work on higher priority features related to anhalytics (in particular aggregations on entity views view which are still absent).

lfoppiano commented 8 years ago

My idea was rather much simpler.

The assumption is that you (the user) know information about {D} already. Given the coverage limitation, compared with Google Scholar, {D} might or not be present within the data (I think it's not mandatory as long as you have the title, for example).

The workflow would be, for each document arriving in anhalytics:

we extract the citations (I think this is done already using GROBID)
we index the citation field item (title would be a good start for example)
we tailor a search in the citation fields (title and/or author, etc) when the user put data from {D}

Of course, I see and agree with your point about the coverage. It more an idea than a mandatory task that could be interesting to explore (indeed, not with high priority). It's based on my needs at the moment for more recent citing papers {D'}, given a ground well known paper {D}.

kermitt2 commented 8 years ago

Ok! Unfortunately even if we consider all documents cited by all the full text documents present in HAL, it will be a very low amount of papers as compared to the whole available corpus. So I think no user could be motivated to enter the metadata of a paper for 2-5% of success rate in the following citation search. We could imagine a better workflow where the metadata would come from an OpenURL link (so without the burden of user input), but that's moving the complexity rather than solving it.

The idea could be more relevant to ISTEX for instance which gathers a decent amount of papers and bibliographical references.

Regarding anHALytics, the focus is not search but analytical analysis, so we are a bit far anyway (the fact that the search GUI was available so early is due to the fact that these components were already developed at the start of the project, but anHALytics is not a project on academic paper search).