gbif / content-crawler

Crawls CMS and articles from Mendeley into ElasticSearch indexes
Apache License 2.0
1 stars 1 forks source link

semicolon not considered a delimiter #12

Closed MortenHofft closed 5 years ago

MortenHofft commented 5 years ago

In the ES literature index: country and countriesOfResearcher differ in parsing. country should be an array as well.

"country": "NO;IS;DK;CA;GB;SE;US;FI;RU;ZA",
"countriesOfResearcher": [
"NO",
"SE",
"FI",
"RU",
"DK",
"ZA",
"IS",
"GB",
"CA",
"US"
],
dnoesgaard commented 5 years ago

I believe country is directly from Mendeley, but not "used" by the ES index. I believe countriesOfResearcher comes from tags...

MortenHofft commented 5 years ago

kk - in that case let us not fix that

dnoesgaard commented 5 years ago

Right. I didn't start populating this field until recently. Can't actually remember why...

MortenHofft commented 5 years ago

I actually mistook it for countryOfCoverage - thinking that was the "other" country. Didn't know we had 3 field for countries.

dnoesgaard commented 5 years ago

I know, it's a bit messy. I think all fields from Mendeley are indexed as-is, and a few more are then created in ES based on tags, etc.