CRISalid-esr / svp-harvester

Sovisu+ publications harvester as microservice
Other
3 stars 2 forks source link

Missing publications from Hal harvester #585

Closed jdp1ps closed 1 month ago

jdp1ps commented 2 months ago

When requesting for research with idHal_s arnaud-brioude , Hal harvester returns only 172 publications. 180 publications are referenced on Hal API : http://api.archives-ouvertes.fr/search/?q=authIdHal_s:"arnaud-brioude"&wt=xml

<result name="response" numFound="180" start="0" maxScore="5.193693" numFoundExact="true">
<doc>
<str name="docid">1130824</str>
<str name="label_s">
jdp1ps commented 2 months ago

This is caused by a difference in the query parameters :

https://api.archives-ouvertes.fr/search/?q=authIdHal_s:arnaud-brioude&sort=halId_s+asc&rows=1000&fq=docType_s:(ART+OR+OUV+OR+COUV+OR+COMM+OR+THESE+OR+HDR+OR+REPORT+OR+NOTICE+OR+PROCEEDINGS)&fl=docid,halId_s,*_title_s,*_subTitle_s,*_abstract_s,*_keyword_s,authIdForm_i,authFullNameFormIDPersonIDIDHal_fs,authQuality_s,docType_s,docSubType_s,publicationDate_tdate,producedDate_tdate,citationFull_s,citationRef_s,authIdHasStructure_fs,labStructId_i,jel_s,*Id_s,page_s,journalId_i,journalTitle_s,journalIssn_s,journalEissn_s,journalPublisher_s,issue_s,volume_s,bookTitle_s,publisher_s,isbn_s

Please indicate if you think that some filters need to be changed @HDubernard

HDubernard commented 1 month ago

Maybe remove all filters to get all document types?

E-Bara commented 1 month ago

The filter for the HAL harvester been changed from:

DEFAULT_DOC_TYPES = [
        "ART",
        "OUV",
        "COUV",
        "COMM",
        "THESE",
        "HDR",
        "REPORT",
        "NOTICE",
        "PROCEEDINGS",
    ]

To:

DEFAULT_DOC_TYPES = ["*"]

This allow the harvester to get all the documents available