Mahdisadjadi / arxivscraper

A python module to scrape arxiv.org for a date range and category
MIT License
277 stars 52 forks source link

filter does not working #16

Open wanghaisheng opened 2 years ago

wanghaisheng commented 2 years ago
        print('k',k,from_day,until_day,filters)
        //k ComputerScience 2022-04-25 2022-04-25 {'categories': ['cs', 'eess'], 'abstract': ['healthcare', 'medical', 'hospital']}
        scraper = arxivscraper.Scraper(category=k, date_from=from_day,date_until=until_day,filters=filters)
        tmp = scraper.scrape()
        print(tmp)
Mahdisadjadi commented 10 months ago

Not sure if I understood your code correctly, but this worked for me:

import arxivscraper.arxivscraper as ax
import pandas as pd

scraper = ax.Scraper(
    category="cs",
    date_from="2022-04-25",
    date_until="2022-04-26",
    t=10,
    filters={"abstract": ["healthcare", "medical", "hospital"]},
)
output = scraper.scrape()
cols = ("id", "title", "categories", "abstract", "doi", "created", "updated", "authors")
df = pd.DataFrame(output, columns=cols)