TamerKhraisha / uspto-patent-data-parser

A python tool for reading, parsing and finding patent using the United States Patent and Trademark (USPTO) Bulk Data Storage System.
MIT License
36 stars 10 forks source link

Only the first line of claim text is read in #8

Open softwaregravy opened 2 months ago

softwaregravy commented 2 months ago

When looking for claim data, only the first line of claim data is ingested.

Claims can contain many lines of text. An example:

<claim id="CLM-00001" num="00001">
<claim-text>1. An imaging lens system including, in order from an object side to an image side:
<claim-text>a first lens element having a concave image-side surface;</claim-text>
<claim-text>a second lens element;</claim-text>
<claim-text>a third lens element with negative refractive power having a convex object-side surface and a concave image-side surface, the object-side and image-side surfaces thereof being aspheric;</claim-text>
<claim-text>a fourth lens element with positive refractive power having a convex image-side surface; and</claim-text>
<claim-text>a fifth lens element with negative refractive power having a convex object-side surface and a concave image-side surface, the object-side and image-side surfaces thereof being aspheric, each of the object-side and image-side surfaces thereof being provided with at least one inflection point;</claim-text>
<claim-text>wherein there are a total of five lens elements in the imaging lens system, and a gap exists between every two adjacent lens elements along an optical axis of the imaging lens system.</claim-text>
</claim-text>
</claim>

Results in claim data being the following:

'claim_information': [{'id': 'CLM-00001', 'num': '00001', 'claim_text': ['1. An imaging lens system including, in order from an object side to an image side:\n']}
softwaregravy commented 2 months ago

I fixed this for myself on a forked branch. Let me know if this is something you'd like to pull in

https://github.com/softwaregravy/uspto-patent-data-parser/pull/1

softwaregravy commented 2 months ago

And this fixes an error: https://github.com/softwaregravy/uspto-patent-data-parser/pull/3