degauss-org / dht

DeGAUSS helper tools
http://degauss.org/dht
GNU General Public License v3.0
4 stars 1 forks source link

specify column types ahead of time for key columns that are operated on #51

Closed cole-brokamp closed 2 years ago

cole-brokamp commented 2 years ago

see https://github.com/degauss-org/census_block_group/issues/17 for why this would be useful.

This might be fixed by using {readr} greater than version 2.0 (see https://readr.tidyverse.org/articles/column-types.html#readr-first-edition-and-readr-2-0-0 for details), but could still be problematic if first 1,000 rows are missing and the last row is missing for lat/lon (as was in the issue linked above).

We can set the lat and lon columns to be numeric every time to avoid this problem since we enforce the column name. This could be done in the read_csv code, while still allowing for {readr} to parse other "pass through" columns automatically. Alternatively, we could force read all other columns as character so that they don't get inadvertently modified when they are read in, passed through, and written back out.

Probably don't need to do for start_date and end_date columns as those are initially parsed as character and I think > 1,000 missing dates are unlikely.

erikarasnick commented 2 years ago

Is this something we should add to our list of degauss overhaul tasks?

cole-brokamp commented 2 years ago

yes, I started a list and will put it up as a project soon

erikarasnick commented 2 years ago

I just started one actually :)