covidatlas / li

Next-generation serverless crawler for COVID-19 data
Apache License 2.0
57 stars 33 forks source link

parse.number('') should return null, not 0 #420

Closed jzohrab closed 4 years ago

jzohrab commented 4 years ago

Original issue https://github.com/covidatlas/coronadatascraper/issues/807, transferred here on Tuesday Apr 14, 2020 at 15:40 GMT


Description

/title

Steps to reproduce

  1. parse.number('')
  2. It's 0!

Expected behavior

It's null?

Additional context

This is a tough one. It seems like it should return null, which will cause a validation error, and if the scraper author wants to return zero for empty string, they can explicitly do: parse.number(parse.string(whatever) || 0)

jzohrab commented 4 years ago

(Transferred comment)

Blank fields in CSVs are "" by default. I personally think returning 0 here would be a bug in almost all cases. I think having the ability to specify this behaviour through a parameter flag would be best, something like : parse.number(str, emptyAsZero=false). The default behaviour however should be to treat an empty string a undefined.

jzohrab commented 4 years ago

(Transferred comment)

An option passed to parse.number would definitely be clean.

jzohrab commented 4 years ago

(Transferred comment)

From a data point of view, this is a tough question.

In an HTML table, it's already hard to convince yourself that an empty column really means zero. If they know there are no deaths, they would type zero, right? If they don't know how many deaths they have, then why would they add a death column?

In a CSV, you can bet your bottom dollar that an empty column could very well mean they don't have data. ArcGIS CSVs are a perfect example of that, they're often littered with fields in some archaic database that no one has any clue about. The same will happen with any API endpoint that is accessing such a database.

And sources that give us a field = 0... do we know what standard they use? Does zero mean zero or not tracking?

I think at the end-user consumption level, we should at the very least say "zero may mean there is no data reported"

jzohrab commented 4 years ago

(Transferred comment)

And at a data collection level we probably want to distinguish between we have data of zero or we have no data. 👍