ecologylab / BigSemanticsWrapperRepository

Repository of wrappers used by the BigSemantics project.
Apache License 2.0
3 stars 9 forks source link

When parsing dates '-' character is deleted #14

Open keithkade opened 10 years ago

keithkade commented 10 years ago

example: http://www.metmuseum.org/Collections/search-the-collections/488690

Other special characters are ok: http://www.moma.org/collection/browse_results.php?criteria=O%3AAD%3AE%3A28723&page_number=1&template_id=1&sort_order=1

quyin commented 10 years ago

can you also give the meta-metadata field that is involved, with type info and full xpath?

Best Regards, Yin Qu (屈垠)

On Tue, Jan 28, 2014 at 5:45 PM, Kade Keith notifications@github.comwrote:

example: http://www.metmuseum.org/Collections/search-the-collections/488690

Other special characters are ok: http://www.moma.org/collection/browse_results.php?criteria=O%3AAD%3AE%3A28723&page_number=1&template_id=1&sort_order=1

— Reply to this email directly or view it on GitHubhttps://github.com/ecologylab/BigSemanticsWrapperRepository/issues/14 .

keithkade commented 10 years ago

scalar name="year" xpath="(//dd[preceding-sibling::dt[contains(text(),'Date:')]])[1]"/

quyin commented 9 years ago

The Met museum wrapper might not be working correctly now. @keithkade could you please check? Please also check if the bug still persists after fixing the wrapper, and update in this thread. Thanks!

keithkade commented 9 years ago

currently updating the wrapper. the bug is still there

quyin commented 9 years ago

is this field of type "String" or "Date"?

if it is "Date", it's possible that the date format is parsed into a java.util.Date object, and serialized into a slightly different format (without dashes) -- which means it's not a bug but a feature. but we might want to change the serialized date format if the use case requires.

keithkade commented 9 years ago

it is a string.

inherits from artwork.xml which has following definition: scalar name="year" comment="Year the work was created." scalar_type="String"

quyin commented 9 years ago

that's weird. when you are done with the wrapper please reassign this to me and I'll take a look.

quyin commented 9 years ago

the dash character is unicode. so I suspect this is another manifestation of the unicode issue we were having.