USPTO / PatentPublicData

Utility tools to help download and parse patent data made available to the public
Other
182 stars 80 forks source link

Problem adding a source to BulkDownloader #79

Closed mustberuss closed 5 years ago

mustberuss commented 5 years ago

I tried to add the file of withdrawn patents to sources.xml using the url tag but it didn't work.

    <source>
        <name>USPTO</name>
        <type>WITHDRAWN_GRANTS</type>
        <download>
            <url>https://www.uspto.gov/sites/default/files/documents/withdrawn.zip</url>
            <count>1</count>
    </download>
    </source>
2019-01-22 07:15:53,154 INFO  [       main] :: Download - --- Start ---
2019-01-22 07:15:53,484 INFO  [       main] :: Download - Source: Source [name=uspto, docType=withdrawn_grants, download=DownloadConfig [downloadUrl=null, scrapeUrl=null, count=null, predicate=null]]
2019-01-22 07:15:54,427 INFO  [       main] :: Download - URLS[0]: []
2019-01-22 07:15:54,757 INFO  [       main] :: Download - --- Finished --- 0

It did work when I changed the xml tag to url2 and added an additional annotated setter in SourceDownload

    @XmlElement(name = "url2")
    public void setDownloadUrl2(String downloadUrl) {
        this.downloadUrl = HttpUrl.parse(downloadUrl);
    }

It also worked with a scrape of https://www.uspto.gov/patents-application-process/patent-search/withdrawn-patent-numbers but I couldn't figure out why the url tag and annotation didn't work.

The grant xml files can contain data for patents that were subsequently withdrawn.