globaldothealth / monkeypox

Mpox 2022 repository
Other
175 stars 36 forks source link

Some issues with CSV data #154

Closed jpluiggi closed 2 years ago

jpluiggi commented 2 years ago

Hi,

While checking le CSV file: "https://raw.githubusercontent.com/globaldothealth/monkeypox/main/latest.csv", I noticed some errors.

Example:

N36,confirmed,"Massachusetts General Hospital, Boston",Boston,United States,USA,,male,,2022-05-18,,Y,2022-05-12,,,,Link to suspected cases in Canada found,,,Y,early May,late April,,Canada,https://www.ncbi.nlm.nih.gov/nuccore/ON563414,,https://www.mass.gov/news/massachusetts-public-health-officials-confirm-case-of-monkeypox,https://www.nbcboston.com/news/local/man-tests-positive-for-extremely-rare-virus-monkeypox/2724367/,,,,,,2022-05-18,,2022-05-19

Here there is a ',' just after "Hospital" which breaks the CSV format. This is not the only line where the sixth field is not in "Country ISO3" format.

Another example:

N39,confirmed,"Clinique l'Actuel, Montreal",,Canada,CAN,30-59,male,,2022-05-23,"oral and genital ulcers, fever",,,Y,,,,,,,,,,,,,https://ici.radio-canada.ca/nouvelle/1884547/orthopoxvirose-simienne-monkeypox-maladie-sante-publique-quebec-canada,https://montreal.citynews.ca/2022/05/19/montreal-public-health-to-provide-update-on-monkeypox/,,,,,,2022-05-18,,2022-05-24

Regards.

Jean-Philippe

abhidg commented 2 years ago

Hi @jpluiggi, commas are allowed within CSV fields as long as they are quoted. pandas and R's read.csv correctly parse this CSV. We also have the public Google sheet which allows export to Excel format as well.