PatentsView / PatentsView-DB

33 stars 15 forks source link

Issue with quoted text #4

Open crew102 opened 6 years ago

crew102 commented 6 years ago

It looks like there is an issue with how PatentsView handles quotation marks. For example, whenever a quotation mark occurs in the patent's title, PatentsView quotes the entire title and adds extra quotation marks around the actual quoted text. You can see this behavior in patent number 5767337:

library(patentsview)

title <- search_pv(
    query = '{"_eq":{"patent_number":"5767337"}}'
)$data$patents$patent_title

cat(title)
#> "Creation of human apolipoprotein E isoform specific transgenic mice in apolipoprotein deficient ""knockout"" mice"

The same behavior is seen in the bulk data files.

sarahkelley commented 6 years ago

Yes, that indeed seems like a bug! Thanks for the heads up, we will let you know when it is corrected!

Everst commented 6 years ago

Thanks for responding Sarah. The reason for this bug is because of quote escapes as we parse first into tsv and only then import into MySQL. Fields separated by quotes. I suggest we fix that at the DB stage as part of our transformation routine.

On Oct 18, 2017, at 7:48 PM, sarahkelley notifications@github.com wrote:

Yes, that indeed seems like a bug! Thanks for the heads up, we will let you know when it is corrected!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

sarahkelley commented 6 years ago

Thanks for the suggestion Evgeny, that makes a lot of sense!

Everst commented 6 years ago

Sarah: an even easier solution - use mediumtext or longtext in the DB to store detailed descriptions http://boolean.co.nz/blog/max-length-for-mysql-text-field-types/135/

On Oct 18, 2017, at 7:48 PM, sarahkelley notifications@github.com wrote:

Yes, that indeed seems like a bug! Thanks for the heads up, we will let you know when it is corrected!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.