danielnsilva / semanticscholar

Unofficial Python client library for Semantic Scholar APIs.
MIT License
289 stars 38 forks source link

Retrieve papers from conference #89

Closed PabloPeso closed 2 months ago

PabloPeso commented 2 months ago

Question

This is a follow up question related to https://github.com/danielnsilva/semanticscholar/issues/58, because I am not clear on the query issue.

How could we get the most cited papers for a given conference, let's say NeurIPS 2023, could I use this tool to retrieve top-10 most cited papers?

This would be the url to query https://www.semanticscholar.org/venue?name=Neural%20Information%20Processing%20Systems&year%5B0%5D=2023&year%5B1%5D=2023&page=2&sort=influence, but I am not sure if it is possible to retrieve this.

danielnsilva commented 2 months ago

Now, you can do this with the bulk search feature.

Sorting by the citationCount field, you'll get a faster response using the search_paper() method with the sort parameter.

If you prefer to sort by the influentialCitationCount field, you would need to retrieve all the papers first and then extract the top 10 from the results.

Top-10 by citationCount:

from semanticscholar import SemanticScholar

sch = SemanticScholar()

results = sch.search_paper(
  '*',
  venue=['Neural Information Processing Systems'],
  year='2023',
  bulk=True,
  sort='citationCount:desc')

print('Top-10 by citationCount:')
for i, item in enumerate(results.items[:10]):
  print(f'{i+1}. {item.year} - {item.citationCount} - {item.title}')

Output:

Top-10 by citationCount:
1. 2023 - 1636 - Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
2. 2023 - 1634 - Visual Instruction Tuning
3. 2023 - 1154 - Direct Preference Optimization: Your Language Model is Secretly a Reward Model
4. 2023 - 1063 - QLoRA: Efficient Finetuning of Quantized LLMs
5. 2023 - 963 - Toolformer: Language Models Can Teach Themselves to Use Tools
6. 2023 - 919 - InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
7. 2023 - 784 - Tree of Thoughts: Deliberate Problem Solving with Large Language Models
8. 2023 - 692 - Self-Refine: Iterative Refinement with Self-Feedback
9. 2023 - 540 - HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
10. 2023 - 455 - Reflexion: language agents with verbal reinforcement learning

Top-10 by influentialCitationCount:

from semanticscholar import SemanticScholar

sch = SemanticScholar()

results = sch.search_paper(
  '*',
  venue=['Neural Information Processing Systems'],
  year='2023',
  bulk=True)
all_results = list(results)

top_10 = sorted(
  all_results,
  key=lambda x: x.influentialCitationCount,
  reverse=True)[:10]

print('Top-10 by influentialCitationCount:')
for i, item in enumerate(top_10):
  print(f'{i+1}. {item.year} - {item.influentialCitationCount} - {item.title}')

Output:

Top-10 by influentialCitationCount:
1. 2023 - 471 - Visual Instruction Tuning
2. 2023 - 343 - Direct Preference Optimization: Your Language Model is Secretly a Reward Model
3. 2023 - 287 - Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
4. 2023 - 234 - InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
5. 2023 - 137 - QLoRA: Efficient Finetuning of Quantized LLMs
6. 2023 - 98 - ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation
7. 2023 - 91 - AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
8. 2023 - 83 - Tree of Thoughts: Deliberate Problem Solving with Large Language Models
9. 2023 - 72 - Toolformer: Language Models Can Teach Themselves to Use Tools
10. 2023 - 65 - Reflexion: language agents with verbal reinforcement learning

Check it out on Replit: https://replit.com/@danielnsilva/semanticscholar-issue-89

PabloPeso commented 2 months ago

Thank you very much, that's exactly what I needed 😃. Very helpful tool.