GSA / code-gov-api

API powering the code.gov source code harvester
http://code.gov
Other
53 stars 28 forks source link

Metadata Fields special characters like: - or ' causing display issues on Code.gov #200

Open RicardoAReyes opened 6 years ago

RicardoAReyes commented 6 years ago

Agency's code.json metadata with special characters are displaying with ? on the project details views.

Elastic search is indexing the code.json records with ? on the field values, please see below.

Elastic Search

"repos": [ { "name": "Intelligent Transportation Systems Operational Data Environment (ITS ODE)", "organization": "FHWA", "description": "The ITS ODE is a real-time virtual data router that ingests and processes operational data from various connected devices � including vehicles, infrastructure, and traffic management centers � and distributes it to other devices and subscribing transportation management applications. Using the ITS ODE within intelligent transportation deployments increases data fluidity and interoperability while meeting operational needs and protecting user privacy. The software�s microservices architecture makes it easy to add new capabilities to meet local needs.", "repositoryURL": "https://github.com/usdot-jpo-ode/jpo-ode",

See example code.json data from source:

"description":"The ITS ODE is a real-time virtual data router that ingests and processes operational data from various connected devices – including vehicles, infrastructure, and traffic management centers – and distributes it to other devices and subscribing transportation management applications. Using the ITS ODE within intelligent transportation deployments increases data fluidity and interoperability while meeting operational needs and protecting user privacy. The software’s microservices architecture makes it easy to add new capabilities to meet local needs.", "repositoryURL":"https://github.com/usdot-jpo-ode/jpo-ode",


Validate with DOT's repo:

https://www.code.gov/#/explore-code/agencies/DOT/repos/dot_fhwa_intelligent_transportation_systems_operational_data_environment_its_ode_

https://www.transportation.gov/sites/dot.gov/files/docs/code.json

froi commented 6 years ago

@RicardoAReyes @jlow81

Yesterday we saw that the characters are showing up as expected in the source Excel file. When viewing the code.json on DOT's site I notice that the code.json itself has these artifacts:

image

This make me think that the issue is happening during the conversion from Excel to JSON. The character encoding must be causing the JSON file to be created in this way. Our harvester is just consuming what there.

My suggestions would be for them to solve this encoding issue and republish.

Nosferican commented 5 years ago

Close issue?

froi commented 5 years ago

I've at least unassigned myself 😅