OpenWaterFoundation / cdss-app-snodas-tools

Colorado's Decision Support Systems (CDSS) Snow Data Assimilation System (SNODAS) Tools
8 stars 4 forks source link

Fix basin names that have end-of-line in attribute table #3

Open smalers opened 7 years ago

smalers commented 7 years ago

4 ? basins seem to have end-of-line characters in the local ID name. This causes the CSV files to break over two lines, which breaks processing for TSTool for those basins and will cause problems for other tools. Fix the attribute table and reprocess the data. Make sure to check the CSV files by editing with Notepad (NOT EXCEL!) and confirming that names don't have intervening line breaks.

egiles16 commented 7 years ago

The shapefile has been corrected for those 6 specific basins. I manually fixed the 6 SnowpackStatisticsByBasin csv files that were experiencing this issue. I then re-created the time series graphs to see if the bug was fixed. All 24 previously empty time series graphs (4 graphs for each of the 6 corrupt basins) were producing correct time series data.

Note that we will have to rerun all of the historical data again to get the corrupt LOCAL NAMES to register in the SnowpackStatisticsByDate csv files. Or, I could write a script to automate the manual fix of the ByDate csv files. Let's discuss which is the better option.

egiles16 commented 7 years ago

Update. I made an executive decision to delete the LF breaks in the historical SnowpackStatisticsbyDate CSV files via a simple Python code. This saved quite a bit of time as there is now no need to rerun the entire historical process. All csv files, both historically and those to be run in the future, will no longer experience this issue.

smalers commented 7 years ago

Before closing this, I suggest that the Python code have a check to see if CR or LF characters are in the middle of an attribute and if so, replace with a single space for processing, at least for the CSV files which are most impacted by this issue. When there is doubt, I often "trim" strings to make sure there is no extra characters, but this will only impact the ends. Adding a warning would also be good so it is easy to track down. This will future proof the code if we apply to other locations with different basin data.