dccabs / pair

Pair project
0 stars 0 forks source link

how to handle inproperly formatted numbers. #8

Open dccabs opened 7 years ago

dccabs commented 7 years ago

How do we want to handle improperly formatted numbers?

lets say I put in

abc123.

I was thinking it would show up in the results but just say something like

abc123 - no records are found that are associated with this number

clmulk commented 7 years ago

Dan, in terms of "proper" format, there's the common parlance, which has some minor variations, and there's the format that public PAIR uses to accept queries. I'll show below the primary ones we should work on, though there's a bigger list that we can get to another time.

  1. US granted patents a. Most common forms use by everybody i. US1234567 (though this sometimes be only 6 digits for those patents prior to the one millionth (they are issued in ascending order). 5 digit numbers here are likely from the very early 1900's or before, and we can expect that none of their data will EVER change, and thus unlikely to be checked. ii. US 1,234,567 (another common form). For interpreting the input, we should have a universal rule that ignores any spaces, commas, or slashes. iii. US123456 - this is merely an example of a 6-digit patent, long ago expired

b. PUBLIC PAIR format i. 1234567 - public PAIR does NOT accept any letters, so this format is fairly simple, but most people would not use it outside of public PAIR. PAIR just has the luxury of knowing that it ONLY has US data, so it treats all numbers as being US based.

  1. US Published applications a. Most common forms use by everybody i. US20040123456 - this standard form starts with the 4-digit year, followed by a 7-digit unique number. The only issue with this type is that often users/databases will drop that leading zero going into the "123456". It is present in case the USPTO surpasses 999,999 applications in one year, in which that leading zero is relevant. However, that has never happened to date. ii. US2004123456 - here's the aforementioned example of dropping that leading zero iii. US/2004/0123456 - not super common for people to write this way. Again, by ignoring spaces, commas, slashes, we should be able to eliminate issues with this format. iv. US 20040123456 - just an example of the same number having a space v. US20040012345 - here's an example where the leading zero is present, but there's ALSO another leading zero. In this example, that second zero can actually be pretty important. So, in a nutshell, think of the last 6-digits in the publication number being crucial, and there's the chance there may be a superfluous zero just prior.

b. PUBLIC PAIR format i. 20040123456 - this is the ONLY format public PAIR accepts - 4-digit year followed by exactly 7-digits of the unique publication number. If a user omits the leading zero, their system will not recognize it.

  1. US Design Patents a. Most common forms i. USD123456 - there are currently fewer than 1,000,000 design patents issued to date, thus, most design patents have a 6-digit unique code. A 5-digit unique code is possible but unlikely. The crucial distinction is that the prefix is "USD" ii. USD 123,456 - same number, just with a space and comma

b. PUBLIC PAIR format i. Guess what? They just drop the "US" and use: D123456

  1. US REISSUE patents a. Most common form i. USRE12345 - US + RE + 5-digit code, as there are currently fewer than 100,000 reissue patents in existence. ii. US RE 12,345 - same number, with comma, space iii. RE12345 - I see the "US" dropped in common practice more so than with any other type.

b. PUBLIC PAIR format i. RE12345 - only format they accept.

So...this, I hope, details some of the issues with the four main patent publication types we might have to deal with. There are a couple additional ones that I personally never use, but eventually we should look into them as well. Though, for the time, being able to deal with the above 4 should cover 95+% of the usage I foresee, so focus should be here.

I'm somewhat in favor of just offering a key to the users that accepts the following for each of the formats:

Patents: a. US1234567 - could be 5 or 6 digits also b. 1234567 - basically the same minus the US

Published applications (with or without the "US", and with or without the leading zero a. US2004123456 b. US20040123456 c. 2004123456 d. 20040123456

Design Patents a. USD123456 b. D123456

REISSUE Patents a. USRE12345 b. RE12345

Does this make sense?

absoluke commented 7 years ago

Chris creates a pretty good summary, but he's given examples of the types of numbers that users use to refer to the actual publications which is very common in patent search world. However, in the patent practitioner world, when it comes to monitoring, attorney will often want to monitor or pull status using the application serial number which ALL PUBLISHED RECORDS HAVE. Not all records have a "published application" and unissued/ungranted patents do not have a patent number. They all have application serial numbers. Here are the three most popular input formats, of which I am aware, and I tested and can confirm that the current public pair uspto website accepts all three formats:

14/325270 14325270 14/325,270