PatentsView / PatentsView-DB

33 stars 15 forks source link

Some patents missing assignee data #28

Open crew102 opened 6 years ago

crew102 commented 6 years ago

Hi -

It looks like some patents are not associated with their "original assignees" in the patentsview bulk data files. For example, according to http://patft.uspto.gov/netahtml/PTO/search-bool.html, patent numbers 4619896, 6255456, and 9623304 are all associated with an original assignee, but when we look in the relevant patentsview file we see that these patents are not associated with any assignees:

library(httr)
library(data.table)

raw_assignee <- tempfile()
temp_dir <- dirname(raw_assignee)

GET(
  "http://www.patentsview.org/data/20171226/rawassignee.tsv.zip",
  write_disk(raw_assignee)
)
unzip(raw_assignee, exdir = temp_dir)

data <- fread(
  file.path(temp_dir, "rawassignee.tsv"),
  sep = "\t", colClasses = "character"
)

nrow(data[data$patent_id %in% c("4619896", "6255456", "9623304"), ])
#> [1] 0

I looked briefly in the raw data files from uspto and it looks like these patents are associated with original assignees in those files as well.

sarahkelley commented 6 years ago

This does look like a problem! Thanks for bring this to our attention, we will look into it an let you know what we find.